Senior MLOps & LLM Engineer

Lucknow, Uttar Pradesh, India
Apr 22, 2025
Apr 22, 2026
Remote
Full-Time
2 Years
Job Description

Are you passionate about pushing the boundaries of AI and deploying cutting-edge LLM systems at scale? Do you thrive on solving complex problems that combine low-latency inference, agent orchestration, and Retrieval-Augmented Generation (RAG)? If yes, Pentimenti AI, a visionary leader in the third wave of AI innovation, wants you to join their growing remote-first team.

As a Senior MLOps & LLM Engineer, you’ll play a mission-critical role in designing and scaling next-gen “knowledge robots” — autonomous, intelligent agents built on top of large language models. This is more than a job, it’s a chance to own and shape the future of agentic platforms and turn world-class AI research into real-world, high-impact applications.

What You’ll Be Doing

  1. Lead the agentic and RAG product roadmap. design, prototype, and deploy LLM agents (planner-executor, multi-agent, tool-calling) optimized for <800ms P95 latency.
  2. Build and optimize scalable RAG pipelines. choose embedding strategies, architect vector DB schemas, implement hybrid retrieval systems, and ensure factual accuracy with robust evaluations and guardrails.
  3. Fine-tune and align large models. apply PEFT/LoRA, reinforcement learning with human feedback (RLHF), and safety techniques to optimize performance and alignment.
  4. Enhance model inference efficiency. implement quantization (INT4/8), speculative decoding, and frameworks like TensorRT-LLM, Ray Serve, or vLLM to slash cost-per-token by ≥40%.
  5. Mentor and lead a high-performing engineering team; establish CI/CD workflows, observability tools, and data governance processes.
  6. Collaborate cross-functionally with product and design teams to translate research into seamless, user-first experiences.

What You’ll Bring

Required Skills & Experience

  • 5+ years of experience in software or ML engineering, with 2+ years working directly on LLM/NLP products at scale.
  • Proficiency in Python and deep learning frameworks like PyTorch, with familiarity in TensorFlow or JAX.
  • Deep knowledge of LangChain, LlamaIndex, and frameworks for agentic design (e.g., CrewAI).
  • Strong hands-on experience with RAG pipelines, vector databases (Weaviate, Pinecone, Qdrant), and hybrid search strategies.
  • Familiarity with model optimization techniques quantization, distillation, GPU kernel tuning, etc.
  • Solid background in cloud infrastructure (AWS, GCP), Kubernetes, Ray, and tools like SageMaker or Vertex AI.
  • Proven ability to write clear design documentation, lead projects, and guide technical teams.

Bonus Points For

  • Experience with multimodal agents (vision-language, audio-language).
  • Contributions to open-source libraries like LangChain, Weaviate, Pinecone, Triton, or vLLM.
  • Background in privacy-preserving ML (e.g., federated learning, differential privacy).
  • Experience with PG-vector, Prometheus/Grafana, Terraform, or Pulumi for observability and IaC.

Your Success Metrics (First 6 Months)

  • Ship a production-grade RAG pipeline with <800ms P95 latency and ≥90% factual accuracy.
  • Reduce inference cost per 1K tokens by at least 40%.
  • Author a technical blog or whitepaper on agent orchestration improving tool reliability by ≥25%.
  • Build and mentor a team of 3–5 engineers, implement structured eval harnesses and model CI/CD pipelines.

Tech Stack You’ll Work With

  1. Languages & Frameworks. Python, PyTorch, TensorFlow, JAX
  2. Agentic Tools. LangChain, LlamaIndex, CrewAI
  3. Infra & MLOps. Kubernetes, Ray Serve, Terraform, SageMaker, GCP Vertex
  4. Vector Databases. Weaviate, Pinecone, Qdrant, PG-vector
  5. Model Optimization. TensorRT-LLM, vLLM, quantization
  6. Monitoring. Prometheus, Grafana

The Hiring Process

  1. Intro Chat. Learn about the vision, values & culture.
  2. Deep Dive. Solve a real-world agent/RAG challenge in our repo.
  3. Research Case Study. Present your past optimization work or share a future-facing roadmap.
  4. Leadership Interview. Connect with founders to discuss team fit and long-term impact.
  5. Offer. Fast-tracked within 2 weeks for top candidates.

How to Apply

  • Click "Apply" and register/login on Uplers' portal.
  • Fill out the screening form and upload your latest resume.
  • Maximize your visibility by completing assessments and increasing your chances of getting shortlisted.

Ready to Take the Leap?

If you're a builder at heart, thrive in a research-to-production environment, and want to help shape the next generation of AI-powered systems apply today and help redefine how the world interacts with information.