Technical Skills
· 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
· Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
· Strong knowledge of Kubernetes and container orchestration best practices.
· Experience managing services on Amazon ECS (Fargate or EC2).
· Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
· Skilled in scripting languages such as Python, Bash, or Go.
· Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
· Deep understanding of microservices architectures, API gateways, and service meshes.
Soft Skills
· Proven leadership and cross-functional collaboration skills.
· Strong problem-solving and incident-resolution mindset.
· Clear communication, documentation, and stakeholder reporting abilities.
· Passion for continuous improvement and automation.
Preferred Qualifications
· AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
· Familiarity with service meshes like Istio or Linkerd.
· Experience with serverless architectures and event-driven systems.
· Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python
SRE- AWS
Job Summary
We are looking for an experienced and driven Senior Site Reliability Engineer (SRE) to architect, implement, and maintain robust cloud infrastructure. This role demands a deep understanding of AWS, Kubernetes, ECS, and the ability to build scalable, secure, and highly available infrastructure from scratch. The ideal candidate will be a strong advocate for DevOps principles, automation, and reliability, and will possess the skills to support and optimize complex microservices-based architectures.
Key Responsibilities
• Infrastructure Design & Implementation
• Design and build highly scalable, fault-tolerant, and secure cloud infrastructure using AWS, Kubernetes, and ECS.
• Lead efforts in infrastructure as code (IaC) using tools like Terraform or CloudFormation.
• Develop and enforce best practices for infrastructure provisioning, security, and cost optimization.
System Reliability & Performance
• Ensure availability, performance, scalability, and security of production systems.
• Implement observability strategies including monitoring, logging, and alerting using tools such as Prometheus, Grafana, ELK, or Datadog.
• Analyse system performance metrics and proactively identify potential issues and bottlenecks.
DevOps & Automation
• Build and maintain CI/CD pipelines to streamline code deployments across environments.
• Drive automation in infrastructure provisioning, configuration management, and operational tasks.
• Ensure repeatable and reliable deployments using containers and orchestration tools like Kubernetes and ECS.
Service Management
• Own the SRE lifecycle, including incident management, postmortems, root cause analysis, and runbook creation.
• Collaborate closely with development and QA teams to ensure seamless microservices integration, deployment, and lifecycle management.
• Maintain service-level objectives (SLOs), service-level agreements (SLAs), and error budgets.
Security & Compliance
• Implement and enforce cloud security best practices for networking, identity and access management, and data protection.
• Support audits, compliance assessments, and vulnerability remediation.
• Monitor for security anomalies and work with security teams to respond to threats.
Technical Skills
• 6+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Engineering.
• Expertise in AWS services such as EC2, S3, RDS, IAM, VPC, Lambda, CloudWatch, etc.
• Strong knowledge of Kubernetes and container orchestration best practices.
• Experience managing services on Amazon ECS (Fargate or EC2).
• Proficient in infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
• Skilled in scripting languages such as Python, Bash, or Go.
• Solid grasp of networking, load balancing, DNS, and firewall rules in cloud environments.
• Deep understanding of microservices architectures, API gateways, and service meshes.
Soft Skills
• Proven leadership and cross-functional collaboration skills.
• Strong problem-solving and incident-resolution mindset.
• Clear communication, documentation, and stakeholder reporting abilities.
• Passion for continuous improvement and automation.
Preferred Qualifications
• AWS certifications such as AWS Certified DevOps Engineer, Solutions Architect – Professional, or equivalent.
• Familiarity with service meshes like Istio or Linkerd.
• Experience with serverless architectures and event-driven systems.
• Knowledge of regulatory compliance (SOC2, ISO 27001, GDPR) in cloud environments.
Skills – AWS Cloud, CICD, EC2, Kubernete, Grafana, Datadog, Python
Key Responsibilities:
Cloud Platform: GCP
• Infrastructure Automation: Design, implement, and manage infrastructure as code using Terraform to provision and manage GCP resources.
• Container Orchestration: Deploy and manage Kubernetes clusters, ensuring efficient operation of containerized applications.
• Continuous Integration/Continuous Deployment (CI/CD): Develop and maintain CI/CD pipelines using Jenkins to automate application build, test, and deployment processes.
• Containerization: Collaborate with development teams to containerize applications using Docker and manage deployments with Helm Charts.
• Code Quality Assurance: Integrate and manage SonarQube to ensure code quality and security standards are met.
• Monitoring and Logging: Implement and manage monitoring solutions using Datadog to ensure system health, performance, and security.
• Collaboration: Work closely with cross-functional teams, including developers, QA, and operations, to streamline processes and improve productivity.
Requirements:
• Experience: 5+ years in DevOps or cloud engineering roles, with at least 3 years of relevant experience in the specified technologies.
• Technical Proficiency:
o Hands-on experience with GCP services and architecture.
o Proficiency in Terraform for infrastructure as code implementations.
o Strong understanding and experience with Kubernetes and Docker.
o Experience in setting up and managing CI/CD pipelines using Jenkins.
o Familiarity with Helm Charts for application deployment.
o Experience with SonarQube for code quality analysis.
o Proficiency in monitoring and logging tools, particularly Datadog.
• Scripting Skills: Proficiency in scripting languages such as Bash or Python is an added advantage.
o Strong problem-solving abilities and analytical thinking.
o Excellent communication skills, both verbal and written.
o Ability to work collaboratively in a team environment.
o Strong organizational and time management skills.
Skills – Terraform, Kubernetes, Cluster, Docker, GCP, SonarQube