Are you passionate about pushing the boundaries of AI and deploying cutting-edge LLM systems at scale? Do you thrive on solving complex problems that combine low-latency inference, agent orchestration, and Retrieval-Augmented Generation (RAG)? If yes, Pentimenti AI, a visionary leader in the third wave of AI innovation, wants you to join their growing remote-first team.
As a Senior MLOps & LLM Engineer, you’ll play a mission-critical role in designing and scaling next-gen “knowledge robots” — autonomous, intelligent agents built on top of large language models. This is more than a job, it’s a chance to own and shape the future of agentic platforms and turn world-class AI research into real-world, high-impact applications.
What You’ll Be Doing
- Lead the agentic and RAG product roadmap. design, prototype, and deploy LLM agents (planner-executor, multi-agent, tool-calling) optimized for <800ms P95 latency.
- Build and optimize scalable RAG pipelines. choose embedding strategies, architect vector DB schemas, implement hybrid retrieval systems, and ensure factual accuracy with robust evaluations and guardrails.
- Fine-tune and align large models. apply PEFT/LoRA, reinforcement learning with human feedback (RLHF), and safety techniques to optimize performance and alignment.
- Enhance model inference efficiency. implement quantization (INT4/8), speculative decoding, and frameworks like TensorRT-LLM, Ray Serve, or vLLM to slash cost-per-token by ≥40%.
- Mentor and lead a high-performing engineering team; establish CI/CD workflows, observability tools, and data governance processes.
- Collaborate cross-functionally with product and design teams to translate research into seamless, user-first experiences.
What You’ll Bring
Required Skills & Experience
- 5+ years of experience in software or ML engineering, with 2+ years working directly on LLM/NLP products at scale.
- Proficiency in Python and deep learning frameworks like PyTorch, with familiarity in TensorFlow or JAX.
- Deep knowledge of LangChain, LlamaIndex, and frameworks for agentic design (e.g., CrewAI).
- Strong hands-on experience with RAG pipelines, vector databases (Weaviate, Pinecone, Qdrant), and hybrid search strategies.
- Familiarity with model optimization techniques quantization, distillation, GPU kernel tuning, etc.
- Solid background in cloud infrastructure (AWS, GCP), Kubernetes, Ray, and tools like SageMaker or Vertex AI.
- Proven ability to write clear design documentation, lead projects, and guide technical teams.
Bonus Points For
- Experience with multimodal agents (vision-language, audio-language).
- Contributions to open-source libraries like LangChain, Weaviate, Pinecone, Triton, or vLLM.
- Background in privacy-preserving ML (e.g., federated learning, differential privacy).
- Experience with PG-vector, Prometheus/Grafana, Terraform, or Pulumi for observability and IaC.
Your Success Metrics (First 6 Months)
- Ship a production-grade RAG pipeline with <800ms P95 latency and ≥90% factual accuracy.
- Reduce inference cost per 1K tokens by at least 40%.
- Author a technical blog or whitepaper on agent orchestration improving tool reliability by ≥25%.
- Build and mentor a team of 3–5 engineers, implement structured eval harnesses and model CI/CD pipelines.
Tech Stack You’ll Work With
- Languages & Frameworks. Python, PyTorch, TensorFlow, JAX
- Agentic Tools. LangChain, LlamaIndex, CrewAI
- Infra & MLOps. Kubernetes, Ray Serve, Terraform, SageMaker, GCP Vertex
- Vector Databases. Weaviate, Pinecone, Qdrant, PG-vector
- Model Optimization. TensorRT-LLM, vLLM, quantization
- Monitoring. Prometheus, Grafana
The Hiring Process
- Intro Chat. Learn about the vision, values & culture.
- Deep Dive. Solve a real-world agent/RAG challenge in our repo.
- Research Case Study. Present your past optimization work or share a future-facing roadmap.
- Leadership Interview. Connect with founders to discuss team fit and long-term impact.
- Offer. Fast-tracked within 2 weeks for top candidates.
How to Apply
- Click "Apply" and register/login on Uplers' portal.
- Fill out the screening form and upload your latest resume.
- Maximize your visibility by completing assessments and increasing your chances of getting shortlisted.
Ready to Take the Leap?
If you're a builder at heart, thrive in a research-to-production environment, and want to help shape the next generation of AI-powered systems apply today and help redefine how the world interacts with information.