DevOps Engineer

Gurugram

Published 22 hours ago

Role Overview:

Experienced DevOps engineer who can own and scale production infrastructure end-to-end - from CI/CD and IaC to observability and incident response. You’ll lead design docs, harden reliability and security, drive cost/perf efficiency. 

What You’ll Do 

● Architect and maintain CI/CD pipelines (build, test, security scans, deploy, rollback) with quality gates and environment promotions. 

● Design and operate container platforms (ECS/EKS or equivalent), service discovery, blue/green & canary strategies, and autoscaling. 

● Implement Infrastructure as Code (Terraform/CDK/CloudFormation), enforce modular, reviewable, and drift-free infra. 

● Build observability: metrics/logs/traces, SLOs/SLIs, dashboards, and actionable alerts; reduce MTTR through runbooks and automation. 

● Champion platform reliability: capacity planning, HA/DR (multi-AZ), backup/restore testing, change management. 

● Own secrets management, IAM least-privilege, network policies, and baseline hardening (CIS where relevant). 

● Drive cost optimization (rightsizing, autoscaling policies, savings plans/spot, storage lifecycle) with monthly reporting. 

● Establish release/incident processes (postmortems, RCAs) and lead remediation to cut change failure rate. 

● Partner with Backend/AI/Frontend teams to productize models/services (GPU pools, batching, caching layers) and streamline developer workflows. 

● lead design reviews, tech spikes, Monitoring and documentation. 

Technical Qualifications 

● 2-3+ years in DevOps/SRE/Platform roles supporting production systems at scale. 

● Strong with AWS : VPC, IAM, ECS/EKS, ALB/NLB, RDS/Elasticache/Object storage, CloudWatch. 

● Proficient in Terraform (or CDK/CloudFormation), CI/CD (GitHub/GitLab/Jenkins/Argo) including artifacts and environment promotion. 

● Containers & orchestration: Docker, task definitions/helm charts, autoscaling, health checks, readiness/liveness. 

● Observability: Prometheus/Grafana, OpenTelemetry, log pipelines (ELK/CloudWatch/Datadog), alert routing. 

● Networking & security: VPC/Subnets, SGs/NACLs, TLS, DNS, WAF, IAM design, secrets (KMS/Parameter Store/Vault). 

● Scripting/automation in Python/Bash, configuration management (Ansible or equivalent). 

● Proven incident management: on-call practice, runbooks, RCAs, tuning alerts to reduce noise. 

Nice to Have 

● Kubernetes (EKS) production experience, service mesh (Istio/Linkerd), GitOps (ArgoCD/Flux). 

● Image and dependency security (Trivy/Grype/Snyk), SBOMs, policy-as-code (OPA/Conftest). 

● Data platform ops (Mysql/Postgres/PITR, replicas), streaming (Kafka/Kinesis). 

● All the corresponding services in azure 

Startup-Specific Expectations 

● Be comfortable with ambiguity and a fast-paced, evolving environment. 

● Proactively take on varied technical tasks outside your comfort zone. 

● Help reduce operational toil via automation and smarter tooling. 

● Contribute ideas on performance, cost savings, and process improvements.

Full time

Associate

Gurugram


Share Job Opening