Job Details

Home

Site Reliability Engineer || Dallas, TX || Onsite at Dallas, Texas, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1908396&uid=

From:

Prashanth B,

Tekgence

[email protected]

Reply to:   [email protected]

Role: Site Reliability Engineer

Duration : 6 months

Location : Dallas, TX

Visa Preference
: H4 or L2 EADs only

Need 10+ Years

Job Summary:

As a Site Reliability Engineer (SRE) , you will play a vital role in ensuring the reliability and performance of our business functions and technology services. Your expertise in monitoring, automation, and proactive issue resolution will help enhance the user experience and maintain optimal system availability. Join a team that thrives on tackling complex challenges and supporting mission-critical operations.

Key Skills:

              Design, implement, and manage comprehensive monitoring solutions for business function availability and performance.

              Develop and maintain dashboards and alerting systems to track key metrics and identify potential issues proactively.

              Collaborate with development, operations, and product teams to enhance system reliability and address points of failure.

              Automate responses to common issues and work towards eliminating manual, repetitive tasks.

              Investigate and troubleshoot incidents, performing root cause analysis and implementing long-term solutions.

              Implement and maintain best practices for incident management and post-mortem processes.

              Ensure high availability and scalability of business functions by improving infrastructure and system design.

              Provide expertise in the management and operation of cloud environments, with a focus on AWS or Azure.

              Support the development and implementation of business continuity and disaster recovery plans.

              Collaborate on projects to improve overall system performance, availability, and security.

              Mentor and guide other engineers on best practices for reliability engineering and monitoring tools."

Responsibilities:

"            Bachelor's degree in computer science, information systems, or equivalent combination of education and experience.

              6+ years of experience in Site Reliability Engineering, DevOps, or a related field.

              Strong knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).

              Proficiency in scripting and automation using languages like Python, Bash, or Power.

              Experience with cloud environments such as AWS, Azure, or GCP.

              Familiarity with CI/CD pipelines and related tools.

              Strong analytical and problem-solving skills, with a focus on root cause analysis and long-term solutions.

Other Qualifications

              Knowledge of incident management frameworks (e.g., ITIL practices).

              Strong communication skills for collaborating with cross-functional teams.

              Ability to work in a remote team environment and handle on-call duties as needed.

Keywords: continuous integration continuous deployment Texas
Site Reliability Engineer || Dallas, TX || Onsite
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1908396&uid=

[email protected]
View All

02:25 AM 07-Nov-24

To remove this job post send "job_kill 1908396" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

prashanth.b@tekgence.com wrote:
From:

Prashanth B,

Tekgence

prashanth.b@tekgence.com

Reply to:   prashanth.b@tekgence.com

Role: Site Reliability Engineer

Duration : 6 months

Location : Dallas, TX

Visa Preference
 : H4 or L2 EADs only

Need 10+ Years

Job Summary:

As a Site Reliability Engineer (SRE) , you will play a vital role in ensuring the reliability and performance of our business functions and technology services. Your expertise in monitoring, automation, and proactive issue resolution will help enhance the user experience and maintain optimal system availability. Join a team that thrives on tackling complex challenges and supporting mission-critical operations.

Key Skills:

Design, implement, and manage comprehensive monitoring solutions for business function availability and performance.

Develop and maintain dashboards and alerting systems to track key metrics and identify potential issues proactively.

Collaborate with development, operations, and product teams to enhance system reliability and address points of failure.

Automate responses to common issues and work towards eliminating manual, repetitive tasks.

Investigate and troubleshoot incidents, performing root cause analysis and implementing long-term solutions.

Implement and maintain best practices for incident management and post-mortem processes.

Ensure high availability and scalability of business functions by improving infrastructure and system design.

Provide expertise in the management and operation of cloud environments, with a focus on AWS or Azure.

Support the development and implementation of business continuity and disaster recovery plans.

Collaborate on projects to improve overall system performance, availability, and security.

Mentor and guide other engineers on best practices for reliability engineering and monitoring tools."

Responsibilities:

"            Bachelor's degree in computer science, information systems, or equivalent combination of education and experience.

6+ years of experience in Site Reliability Engineering, DevOps, or a related field.

Strong knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).

Proficiency in scripting and automation using languages like Python, Bash, or Power.

Experience with cloud environments such as AWS, Azure, or GCP.

Familiarity with CI/CD pipelines and related tools.

Strong analytical and problem-solving skills, with a focus on root cause analysis and long-term solutions.

Other Qualifications

Knowledge of incident management frameworks (e.g., ITIL practices).

Strong communication skills for collaborating with cross-functional teams.

Ability to work in a remote team environment and handle on-call duties as needed.

Keywords: continuous integration continuous deployment Texas 
Site Reliability Engineer || Dallas, TX || Onsite
prashanth.b@tekgence.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 10

Location: Dallas, Texas