Job Details

Home

Hybrid Site Reliability Engineer ANY VISA at Atlanta, Georgia, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2071977&uid=

Must have local Atlanta, GA(30354)-Local candidate

Job Title: Site Reliability Engineer (SRE) - AWS and

Dynatrace

Location: Atlanta, GA(30354)-Local candidate

Duration: 12+ Months

MOI: Skype

Need Linkedin-2 candidate

Position Overview:

As a Site Reliability Engineer (SRE), you will be

responsible for designing and maintaining high-performance, scalable, and

reliable systems. You will lead efforts in observability by building actionable

dashboards using AWS CloudWatch and Dynatrace. Your role will also involve

defining Service Level Agreements (SLAs) and Service Level Objectives (SLOs),

driving automation for monitoring and alerting, and implementing strong

practices for root cause analysis in AWS environments.

Key Responsibilities:

Observability & Monitorin

- Build, maintain,

and optimize observability dashboards using Dynatrace and AWS CloudWatch for

real-time insights into system performance and reliability.

- Ensure

comprehensive monitoring, alerting, and logging strategies are in place,

leveraging AWS native services and Dynatrace for end-to-end observability.

SLA/SLO Management:

- Define, monitor,

and manage SLAs, SLOs, and SLIs to meet service reliability and performance

requirements.

- Work closely with

development and operations teams to align on SLO targets and ensure reliability

goals are me

Root Cause Analysis (RCA):

- Conduct in-depth

root cause analysis (RCA) for incidents, identifying key bottlenecks and system

failures, particularly in AWS cloud technologies.

- Automate RCA

processes and contribute to knowledge-sharing by creating postmortem reports.

Cloud Performance & Reliability:

- Improve the

reliability, performance, and scalability of AWS-based systems by developing

and implementing best practices in AWS architecture.

- Collaborate with

DevOps and engineering teams to improve system infrastructure, deployment

pipelines, and cloud automation.

Automation & Optimization:

Automate

repetitive tasks in monitoring, incident management, and alerting, reducing

operational toil and manual intervention.

- Lead the effort to

build self-healing systems by integrating auto-remediation tools and scripts

for issue resolution.

Collaboration & Documentation:

- Partner with

cross-functional teams to integrate observability into the development

lifecycle, ensuring monitoring and logging requirements are met.

- Document system

architecture, monitoring practices, and incident resolution playbooks for

future reference

Required Skills & Experience

5+ years of

experience as a Site Reliability Engineer (SRE) or related role with a focus on

observability, monitoring, and AWS cloud technologies.

Strong

hands-on experience with Dynatrace for monitoring, root cause analysis, and

performance optimization.

Proficiency

in AWS CloudWatch for creating dashboards, logs, alarms, and metrics

Demonstrated

experience in defining and managing SLAs, SLOs, and SLIs to ensure system

reliability

In-depth

understanding of AWS technologies including EC2, Lambda, RDS, VPC, S3, etc

Expertise in

root cause analysis and troubleshooting large-scale distributed systems in AWS.

Experience

with automation tools (e.g., Terraform, Ansible, or CloudFormation) for

infrastructure and incident management

Nice-to-Have

Familiarity

with other monitoring tools such as Prometheus, Grafana, or New Relic.

Knowledge of

scripting languages (e.g., Python, Bash) for automating tasks.

Experience

working in a DevOps or Agile environment.

--

Keywords: sthree information technology Georgia
Hybrid Site Reliability Engineer ANY VISA
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2071977&uid=

[email protected]
View All

11:33 PM 10-Jan-25

To remove this job post send "job_kill 2071977" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 9

Location: Atlanta, Georgia