DevOps Engineer

Bangalore, Karnataka, India
Mar 31, 2025
Mar 31, 2026
Remote
Full-Time
2 Years
Job Description

We are looking for an experienced Site Reliability Engineer (SRE) / DevOps Engineer to join our team and contribute to the reliability, scalability, and efficiency of our systems. As part of our engineering team, you will work on designing, implementing, and managing cloud infrastructure, ensuring seamless deployment pipelines, and maintaining observability across various applications and environments.

This role requires expertise in observability tools, cloud technologies, infrastructure as code, and automation. You will be responsible for troubleshooting complex issues, optimizing system performance, and driving operational excellence across the organization.

Key Responsibilities

System Architecture & Infrastructure

  • Design, implement, and maintain scalable, secure, and highly available cloud infrastructure.
  • Define best practices and benchmarks for Non-Functional Requirements (NFRs) like scalability, reliability, and security.
  • Work with Infrastructure as Code (IaC) tools such as Terraform and Ansible to automate deployments and infrastructure provisioning.
  • Optimize system performance and troubleshoot issues related to system architecture and cloud environments.

Observability & Monitoring

  • Implement and maintain observability tools such as NewRelic, Datadog, Dynatrace, Prometheus, and Grafana to monitor system health and performance.
  • Set up logging and monitoring solutions using ELK (Elasticsearch, Logstash, Kibana) or EFK (Elasticsearch, Fluentd, Kibana) stacks.
  • Analyze system metrics and logs to proactively identify performance bottlenecks and troubleshoot incidents.

CI/CD & Automation

  • Develop and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, and automate software deployment workflows.
  • Improve build and deployment processes to ensure faster and more reliable software releases.
  • Work with development teams to integrate automated testing, security scans, and compliance checks into CI/CD pipelines.

Site Reliability & Incident Management

  • Apply SRE principles to maintain system reliability, availability, and performance.
  • Investigate and resolve complex incidents, conduct root cause analysis (RCA), and implement long-term fixes.
  • Define and maintain service level indicators (SLIs), service level objectives (SLOs), and error budgets.
  • Develop auto-healing mechanisms and optimize failover strategies to minimize downtime.

Security & Compliance

  • Implement security best practices in infrastructure, access control, and monitoring.
  • Work with DevSecOps principles to ensure compliance with industry standards and security guidelines.
  • Ensure proper role-based access control (RBAC) and secrets management.

Technical Leadership & Collaboration

  • Collaborate with cross-functional teams, including developers, QA engineers, and product managers, to define technical solutions.
  • Conduct code reviews, architectural reviews, and performance tuning sessions.
  • Lead proof-of-concepts (PoCs) to validate new technologies and best practices before full-scale adoption.
  • Act as a mentor to junior engineers and share knowledge through technical documentation and training sessions.

Required Skills & Qualifications

Must-Have Skills

  1. Observability tools. Experience with NewRelic, Datadog, Dynatrace, Prometheus & Grafana, ELK/EFK stack.
  2. Cloud Technologies. Expertise in AWS, Azure, or Google Cloud Platform (GCP).
  3. Containerization & Orchestration. Strong knowledge of Docker and Kubernetes.
  4. Infrastructure as Code (IaC). Hands-on experience with Terraform and Ansible.
  5. Linux Administration. Proficiency in managing and troubleshooting Linux-based environments.
  6. CI/CD Pipelines. Experience with Jenkins, GitLab CI/CD, and deployment automation.
  7. SRE Principles. Knowledge of SLIs, SLOs, error budgets, and incident management.
  8. Security Best Practices. Understanding of RBAC, IAM, and cloud security measures.

Preferred Qualifications

  • Bachelor's or Master’s degree in Computer Science, Information Technology, or a related field.
  • Certification in AWS, Azure, Kubernetes, or Terraform is a plus.
  • Experience with serverless computing (AWS Lambda, Azure Functions, Google Cloud Functions).
  • Familiarity with machine learning infrastructure and AI-driven observability solutions.

Why Join Nagarro?

  1. Innovative Work Culture. Be part of a team that thrives on innovation, collaboration, and cutting-edge technologies.
  2. Global Exposure. Work with clients and teams across multiple countries and industries.
  3. Learning & Development. Continuous learning opportunities, certifications, and mentorship programs.
  4. Flexible Work Environment. Remote and hybrid work options to maintain a healthy work-life balance.
  5. Career Growth. Clear career progression paths and opportunities to take on leadership roles.

How to Apply?

If you are excited about this opportunity and have the skills we are looking for, apply now and be a part of Nagarro’s dynamic and forward-thinking engineering team!

Join us and make a difference in the world of digital product engineering!

Related Jobs