Arjun Mehta
Machine Learning Engineer — Serving & MLOps
Seattle, WA · arjunmehta.ml · github.com/arjunmehta · linkedin.com/in/arjunmehta
Summary
ML engineer with 5 years shipping models to production at scale. Held p95 inference under 80ms at 8K QPS, cut training-serving skew below 1%, and stood up a RAG pipeline that doubled answer relevance. Strong in PyTorch, Triton, MLflow, and Kubernetes-based model serving.
Skills
- ModelingPyTorch · TensorFlow · ONNX · fine-tuning · embeddings
- ServingTriton · Kubernetes · Ray · gRPC
- MLOpsMLflow · Feast (feature store) · Airflow · model monitoring
- GenAIRAG · pgvector · LangChain · evals
Experience
Machine Learning Engineer — Cascade AI
2021 — Present
Seattle, WA
- Owned end-to-end serving of 6 ranking models on Triton + Kubernetes; held p95 inference latency under 80ms at 8K QPS.
- Built a Spark-based feature pipeline (200+ features) backed by a Feast feature store, cutting training-serving skew from 12% to under 1%.
- Stood up a RAG pipeline (LangChain + pgvector) for internal search; an eval set of 240 queries showed 78% answer-relevance vs a 41% keyword baseline.
- Cut model-serving GPU cost 30% by introducing dynamic batching and right-sizing inference nodes.
ML Engineer, Associate — Datawave
2019 — 2021
Remote
- Productionized a fraud-detection model behind a gRPC service handling 3K RPS with a 99.95% uptime SLO.
- Built CI/CD for models (MLflow registry + automated eval gates), cutting release time from days to hours.
Education
M.S. Computer Science (Machine Learning) — University of Washington
2017 — 2019
Certifications
TensorFlow Developer Certificate · AWS Certified Machine Learning — Specialty