| Urgent Hiring || Cloud Infrastructure Site Reliability Engineer || Georgia - New Jersey at Alpharetta, Georgia, USA |
| Email: [email protected] |
|
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2658133&uid=89c9faccb39545af8687c6af5292d90a pradeep bhondwe <[email protected] > | | 9:55AM (3 minutes ago) | | | to ujjwala.gangwani, Critical-requirement | | Hello, My name is Ujjwala Gangwani, and I work as a Technical Recruiter for K-Tek Resourcing. We are searching for Professionals below business requirements for one of our clients. Please read through the requirements and connect with us in case it suits your profile. Please see the Job Description and if you feel Interested then send me your updated resume at [email protected] or give me a call at 8329571776 . Linkedin: linkedin.com/in/ujjwala-g-38a42b18b NEED ONLY LOCAL CANDIDATES OF GEORGIA, NEW JERSEY AND NEW YORK PS: No C2C for and candidates Title: Cloud Infrastructure Site Reliability Engineer (SRE) Location: Alpharetta, GA or Berkeley Heights, NJ (5 Days Onsite) Job Description: As a Cloud Infrastructure Site Reliability Engineer (SRE) with expertise across multiple public cloud platforms, you will be responsible for managing and operating cloud infrastructure in alignment with the principles of Googles SRE model. Your role will focus on ensuring the reliability, availability, and performance of our cloud services, while driving automation and continuous improvement across production environments. You will collaborate closely with cross-functional teams to strengthen our cloud reliability posture and streamline operations through innovative automation solutions. Key Responsibilities: Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, GCP, or Azure. Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible). Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users. Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements. Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards. Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil. Document operational processes and system architectures to ensure knowledge sharing and repeatability. Continuously evaluate and implement new technologies to improve system reliability, security, and efficiency. Qualifications: Bachelors degree in computer science, Engineering, or a related technical field, or equivalent practical experience. 3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++). Experience administrating cloud platforms (AWS, GCP, Azure), including networking, security, containerization, storage, data management, and serverless technologies. Solid understanding of Linux systems, networking fundamentals, virtualized, and distributed systems, file systems, system processes and configurations. Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments. Ability to set up and maintain monitoring dashboards, alerts, and logs. Familiarity with Continuous Integration/Continuous Deployment (CI/CD) tools for automated testing, deployments, provisioning, and observability. Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews. Understanding of setting, monitoring, and maintaining Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) for system reliability. Additional Qualifications a Plus: Experience working with enterprise-scale financial services or other regulated industries Certifications: Certified Engineer, DevOps, SRE, CSREF -- Keywords: cplusplus continuous integration continuous deployment access management information technology golang card Georgia New Jersey Urgent Hiring || Cloud Infrastructure Site Reliability Engineer || Georgia - New Jersey [email protected] http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=2658133&uid=89c9faccb39545af8687c6af5292d90a |
| [email protected] View All |
| 07:33 PM 06-Aug-25 |