The Art of Translation
What if your model could just look where it needed to? Exploring attention mechanisms in sequence-to-sequence learning and how they revolutionize machine translation.
Hello! I'm a Master's student at Georgia Tech advised by Prof. Bo Dai. My research currently lies in Long Context Modeling and Reinforcement Learning. I hope to spend my time in RL post-training and/or AI for Science for the foreseeable future.
Computer Science and Mathematics at Georgia Tech. Focused on Machine Learning and Mathematical Modeling.
Most recently I led the software stack at Swish Robotics. Prior to that I worked on Fraud Detection Models and Infrastructure at Credit Karma. Indian National Math Olympiad finalist and a Top 300 rank on the Putnam Math Contest.
What if your model could just look where it needed to? Exploring attention mechanisms in sequence-to-sequence learning and how they revolutionize machine translation.
Benchmarking Flash Attention v1 and v2 in Triton against a naive PyTorch implementation of Scaled Dot Product Attention and Multi Headed Attention.
The Exploration vs. Exploitation Dilemma and a brief introduction to Reinforcement Learning
Optimizer Routing can benefit Scientific Discovery
Building RL training infrastructure and environments for Machine Learning Engineering tasks on the back of MLE-Dojo. Conducting evals to test whether benchmark failures are actually long-context–bottlenecked.
Worked with Prof. Clio Andris on geographic visualizations and spatial information theory.
A Design Space of Node Placement Methods for Geospatial Network Visualizations.
This is a small study of different ablations of the Muon optimizer on FAIR Chem's GemNet-OC Architecture on the OMAat24 dataset. The goal was to see whether routing Muon to specific blocks helps this architecture.
Implemented LoRA adapters on a 7B Mistral Model and distilled into a smaller student.
Benchmarking Flash Attention v1 and v2 in Triton against a naive PyTorch implementation of Scaled Dot Product Attention and Multi Headed Attention.
Trained a sequence-to-sequence model with and without the attention mechanism to translate natural language to Python snippets.