Job Opportunity - SRE - Site Reliability Engineer at Atlanta, Georgia, USA |
Email: [email protected] |
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2316505&uid= From: iswarya, Smart Tech Link [email protected] Reply to: [email protected] Hi, We do have a priority requirement with one of our clients. Kindly review and let me know if you have any questions. SRE - Kubernetes Location - Atlanta GA (onsite 3 days ) Mandatory Areas Must Have Skills - Skill 1 7 Yrs of Exp Kubernetes, GitLab, Splunk o11y - Skill 2 7 Yrs of Exp , Prometheus , python or go language scripting - Skill 3 5Yrs of Exp , Java troubleshooting, cisco observability Responsibilities Infrastructure Management: Design, implement, and manage Kubernetes clusters in production environments to ensure high availability and reliability. Automation: Build and manage automation tools and scripts for continuous deployment, scaling, and self-healing of applications using Kubernetes and associated tooling (Helm, kubectl, Kustomize). Monitoring and Metrics: Implement robust monitoring solutions using Prometheus, Grafana, and other observability tools to track the health of Kubernetes clusters, applications, and services. Incident Management: Work with cross-functional teams to respond to incidents, identify root causes, and implement solutions to prevent recurrence. CI/CD Pipeline Optimization: Design and maintain continuous integration and deployment pipelines to improve the release cycle and reduce downtime. Capacity Planning: Forecast resource needs, scale systems efficiently, and optimize cloud infrastructure to meet growing demand. Disaster Recovery: Define and implement strategies for backup, recovery, and failover to ensure data integrity and uptime. Collaboration: Partner closely with development teams to help design scalable, resilient, and performant architectures on Kubernetes. Security: Ensure that the Kubernetes infrastructure follows best practices for security, including network policies, RBAC, and Pod security policies. Required Skills & Qualifications: Experience with Kubernetes: Hands-on experience in deploying and managing Kubernetes clusters (preferably in production environments). Cloud Platforms: Strong experience with cloud platforms like AWS, GCP, or Azure, with a focus on Kubernetes as a service (e.g., EKS, GKE, AKS). Containerization: Expertise in container technologies like Docker, container orchestration with Kubernetes, and Helm charts. Automation Tools: Familiarity with Infrastructure-as-Code tools such as Terraform, Ansible, or CloudFormation. Monitoring & Observability: Knowledge of monitoring tools such as Prometheus, Grafana, ELK stack, or similar. Networking: Understanding of networking concepts (DNS, Load Balancers, etc.) and how they apply to Kubernetes. CI/CD Pipelines: Strong knowledge of CI/CD tools like Jenkins, GitLab CI, or CircleCI. Scripting: Proficiency in scripting languages such as Bash, Python, or Go. Incident Response & Root Cause Analysis: Experience in managing and resolving production incidents with a focus on improving systems after the event. Collaboration & Communication: Excellent communication skills to work in cross-functional teams and interact with stakeholders across the company. Thanks & Regards Iswarya | Technical Recruiter [email protected] | www.smarttechlink.com | Keywords: continuous integration continuous deployment golang Georgia Job Opportunity - SRE - Site Reliability Engineer [email protected] http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2316505&uid= |
[email protected] View All |
08:52 PM 04-Apr-25 |