| Site Reliability Engineer || Houston, TX at Houston, Texas, USA |
| Email: [email protected] |
|
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=729328&uid= From: Pulkit Mathur, Opallios Inc [email protected] Reply to: [email protected] The candidate must have senior level experience deploying and supporting applications in OpenShift/Kubernetes container platforms. The successful candidate will possess a strong developer background as well as interpersonal skills needed to communicate design requirements and objectives while providing thought leadership to peers and leadership. Candidates should be self-motivated and collaborative IT professionals with a strong background in software development, systems administration and IT automation. Responsibilities: * Maintaining survivability and reliability of IT/OT critical resources. * Write and build CI/CD pipelines and build/release processes for IT/OT workflow applications. * Provide mentoring to the IT/OT Devops team in the best practices associated with CI/CD deployments using ADO, and GIT. * Perform periodic load and scalability testing to establish baselines, drift, and capacity planning. * Conduct weekly operational state reviews covering performance trends, anomalies, errors, and other availability events with SREs, product owners, and development teams. * Participate in quarterly business and operational reviews aligning on roadmaps, development velocity, efficiency, growth trends, etc. * Plan and execute periodic Disaster Recovery exercises including both tabletop and simulated failures (fault injection). Required Qualifications * Candidates must have a bachelors degree and 10 years of IT experience. * Senior level experience with OCP and Kubernetes. (Expert) * Familiarity with continuous integration/deployment processes and tools such as IDEs (Eclipse), Source Code management. (GIT/Stash), ADO Pipelines, Maven, Nexus artifacts, etc. * Strong understanding of SRE practices: incident response, change/release management, capacity planning, infrastructure automation, elastic environments, chaos engineering and blameless postmortems. * Expertise in application performance monitoring, observability, and proactive alert correlation, including monitoring containers and failure-based alerting. * Scripting experience such as Python and Bash * Experienced in deploying applications in OCP in both public and private cloud. * Excellent written and oral communications skills * Demonstrated ability to communicate to nontechnical audience on technical issues. * Demonstrated ability to communicate on a technical level to a technical audience. * Strong interpersonal skills, adaptable and able to learn quickly. * Requires limited supervision and have excellent time management skills. * Self-motivated and self-starter. * Ability to work and interact with others in a structured/team environment. Technology Stack Experience with at least one technology in each of the tech stack categories below: * Monitoring and Logging Tools(s): AppDynamics, Splunk, ELK Stack, Datadog, Prometheus, AWS CloudWatch/X-Ray, Grafana * Programming: C# .NET, Power, Python, YAML * Containers: Docker, Helm Chart * OS: Linux RHEL, Ubuntu, CentOS * Code Repos: Azure Repos, GitHub * Infrastructure as code: Terraform, Ansible * Automation Tools: Jenkins, Chef, Puppet * Agile: JIRA, SAFe Desired Qualifications * Experience in cloud/virtual technologies and management VMware, AWS, Azure, etc. * Knowledge, skills and abilities to support web server technologies Apache, Nginx, IIS. * Knowledge, skills and abilities to automate the creation of Platform as a Services (PaaS) infrastructure using industry standard tools such as Ansible and Chef. * Familiarity with Industrial Control System (ICS) security architecture Purdue model. Keywords: csharp continuous integration continuous deployment information technology http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=729328&uid= |
| [email protected] View All |
| 09:13 PM 09-Oct-23 |