| Site Reliability Engineer || Dallas, TX || Onsite at Dallas, Texas, USA |
| Email: [email protected] |
|
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=1908396&uid= From: Prashanth B, Tekgence [email protected] Reply to: [email protected] Role: Site Reliability Engineer Duration : 6 months Location : Dallas, TX Visa Preference : H4 or L2 EADs only Need 10+ Years Job Summary: As a Site Reliability Engineer (SRE) , you will play a vital role in ensuring the reliability and performance of our business functions and technology services. Your expertise in monitoring, automation, and proactive issue resolution will help enhance the user experience and maintain optimal system availability. Join a team that thrives on tackling complex challenges and supporting mission-critical operations. Key Skills: Design, implement, and manage comprehensive monitoring solutions for business function availability and performance. Develop and maintain dashboards and alerting systems to track key metrics and identify potential issues proactively. Collaborate with development, operations, and product teams to enhance system reliability and address points of failure. Automate responses to common issues and work towards eliminating manual, repetitive tasks. Investigate and troubleshoot incidents, performing root cause analysis and implementing long-term solutions. Implement and maintain best practices for incident management and post-mortem processes. Ensure high availability and scalability of business functions by improving infrastructure and system design. Provide expertise in the management and operation of cloud environments, with a focus on AWS or Azure. Support the development and implementation of business continuity and disaster recovery plans. Collaborate on projects to improve overall system performance, availability, and security. Mentor and guide other engineers on best practices for reliability engineering and monitoring tools." Responsibilities: " Bachelor's degree in computer science, information systems, or equivalent combination of education and experience. 6+ years of experience in Site Reliability Engineering, DevOps, or a related field. Strong knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in scripting and automation using languages like Python, Bash, or Power. Experience with cloud environments such as AWS, Azure, or GCP. Familiarity with CI/CD pipelines and related tools. Strong analytical and problem-solving skills, with a focus on root cause analysis and long-term solutions. Other Qualifications Knowledge of incident management frameworks (e.g., ITIL practices). Strong communication skills for collaborating with cross-functional teams. Ability to work in a remote team environment and handle on-call duties as needed. Keywords: continuous integration continuous deployment Texas Site Reliability Engineer || Dallas, TX || Onsite [email protected] http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=1908396&uid= |
| [email protected] View All |
| 02:25 AM 07-Nov-24 |