We are seeking an AI Ops Engineer with at least 4 years of hands-on experience in AI Ops, MLOps, or related fields. In this role, you will be instrumental in building and maintaining robust AI/ML pipelines and production-grade systems. You’ll work closely with data scientists, engineers, and platform teams to deploy, monitor, and manage machine learning models and services in scalable, reliable, and secure environments.
This is a highly technical, collaborative, and impactful position that requires a deep understanding of modern DevOps practices, cloud infrastructure (especially Azure), and AI/ML lifecycle management.
Experience. 4+ Years
What You’ll Do
- Design, implement, and manage end-to-end MLOps pipelines using Azure ML and other Azure-based services.
- Build scalable APIs and microservices using Python and FastAPI to expose AI/ML models for consumption across the organization.
- Develop and deploy containerized applications using Docker and orchestrate them using Kubernetes or Azure Kubernetes Service (AKS).
- Create reproducible development environments using DevContainers to streamline collaboration and onboarding.
- Integrate CI/CD workflows using tools like Jenkins, GitHub Actions, or Azure DevOps to automate model training, testing, deployment, and monitoring.
- Collaborate with Data Scientists to transition prototypes into robust production-ready solutions.
- Monitor and optimize the performance of ML systems in production, implementing observability best practices and automating alerting mechanisms.
- Ensure adherence to best practices in security, compliance, and governance across the AI/ML development and deployment lifecycle.
- Stay up-to-date with industry trends in AI Ops and proactively suggest improvements to existing infrastructure and workflows.
What We’re Looking For
- Strong programming skills in Python, with a focus on writing clean, testable, and scalable code.
- Experience with FastAPI or similar Python-based web frameworks to build lightweight RESTful APIs.
- Deep hands-on experience with Kubernetes or Azure Kubernetes Service (AKS) for orchestrating scalable applications.
- Proven working knowledge of Microsoft Azure, particularly its AI and ML services including Azure ML Flow, Azure Blob Storage, Azure Functions, etc.
- Proficiency in using DevContainer to standardize and simplify development environments across teams.
- Solid understanding and practical application of CI/CD tools like Jenkins, GitHub Actions, or Azure DevOps.
- Experience working with container technologies such as Docker in production environments.
- A strong grasp of DevOps principles including Infrastructure as Code (IaC), configuration management, and continuous monitoring.
- Excellent problem-solving skills, attention to detail, and the ability to thrive in a fast-paced, ever-evolving environment.
- Effective communication and collaboration skills, especially when working across cross-functional teams.
Nice to Have
- Experience with Terraform, Helm Charts, or other IaC tools.
- Familiarity with monitoring tools such as Prometheus, Grafana, Azure Monitor, etc.
- Exposure to data versioning and experiment tracking tools like MLflow or DVC.
- Background in data science or machine learning fundamentals is a plus.
Why Join Us?
- Work with cutting-edge technology in AI and DevOps.
- Be part of a dynamic, forward-thinking team that values innovation and collaboration.
- Enjoy flexibility, autonomy, and an environment that promotes learning and growth.
- Contribute to projects that have real-world business impact across industries.
Ready to shape the future of AI operations? Apply now and bring your expertise to a team that's redefining how intelligent systems are built and deployed.