Job Details

Home

Looking for SRE (Observability) Engineer (Remote) at Remote, Remote, USA

Hi,

Job Title: SRE (Observability) Engineer

Duration: 3 months (with potential extension)

Location: Remote

Job Overview:

We are seeking a highly skilled
SRE (Observability) Engineer
to join our team, with a strong focus on modern observability practices and tools. The ideal candidate will have hands-on experience in provisioning, configuring, and developing infrastructure solutions with an emphasis on automation, scalability, and reliability. This role blends development, system architecture, and troubleshooting responsibilities, providing opportunities to influence the evolution of our infrastructure. The position will be remote and will require passing a Hacker Earth Assessment to demonstrate proficiency in
Automation,
Python, and general
SRE
skills.

Responsibilities:

Design and Implement Observability Solutions: Use tools such as
Dynatrace,
Prometheus,
Thanos, or
Grafana
to create comprehensive system monitoring, including metrics, alerts, and silences.

Automate Infrastructure Tasks: Leverage
Chef,
Ansible,
Terraform, and
GitLab CI/CD
to automate infrastructure configuration and deployment.

Scripting for Automation: Write scripts using
Python,
Power, or
Bash
to automate tasks and streamline operations.

Troubleshoot and Resolve Issues: Use
SRE principles
to conduct root cause analysis and implement corrective actions for system reliability.

Provision and Configure Cloud Resources: Provision and configure resources using
Azure,
GCP, or
AWS
via
CLI
or APIs.

Documentation and Runbooks: Develop and maintain clear technical documentation, including
runbooks, application guides, and system configurations.

System Architecture: Plan, design, and implement scalable and redundant system architecture to meet organizational goals.

Required Skills:

Observability Tools: Proficiency with
Dynatrace,
Prometheus,
Thanos,
Grafana, or similar tools for monitoring and observability.

Infrastructure Automation: Expertise in
Chef,
Ansible,
Terraform, and
GitLab CI/CD
to automate infrastructure tasks.

Scripting Languages: Advanced knowledge of
Python,
Power, or
Bash
for system automation.

Cloud Platforms: Experience in provisioning and configuring cloud resources on
Azure,
GCP, or
AWS.

SRE Practices: Strong understanding of
root cause analysis,
troubleshooting, and applying
SRE principles
to maintain system reliability.

Documentation: Ability to write detailed and clear
technical documentation, including
runbooks
and
system configurations.

System Architecture: Understanding of
scalability
and
redundancy
in system architecture design.

Preferred Skills:

Kubernetes: Familiarity with container orchestration.

Linux Administration: Expertise in
Linux
configuration, package management, and troubleshooting.

Networking: Knowledge of
VPCs,
Proxies,
CDNs, and integration into scalable systems.

Storage Systems: Understanding of
block
and
object storage
configurations.

Other Requirements:

Hacker Earth Assessment: Candidates must pass an assessment to validate skills in
Automation (Chef, Ansible, Terraform),
Python, and general
SRE.

Remote Work: This is a
remote
position, and candidates must be able to work from anywhere.

Experience Level: Preference for candidates with a minimum of 5+ years in SRE or related roles.

--

Keywords: continuous integration continuous deployment information technology
Looking for SRE (Observability) Engineer (Remote)
[email protected]

[email protected]
View All

09:28 PM 02-Dec-24

To remove this job post send "job_kill 1974399" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

machavarapu7013@gmail.com wrote:
Hi,

Job Title: SRE (Observability) Engineer

Duration: 3 months (with potential extension)

Location: Remote

Job Overview:

We are seeking a highly skilled
 SRE (Observability) Engineer
 to join our team, with a strong focus on modern observability practices and tools. The ideal candidate will have hands-on experience in provisioning, configuring, and developing infrastructure solutions with an emphasis on automation, scalability, and reliability. This role blends development, system architecture, and troubleshooting responsibilities, providing opportunities to influence the evolution of our infrastructure. The position will be remote and will require passing a Hacker Earth Assessment to demonstrate proficiency in
 Automation,
 Python, and general
 SRE
 skills.

Responsibilities:

Design and Implement Observability Solutions: Use tools such as
 Dynatrace,
 Prometheus,
 Thanos, or
 Grafana
 to create comprehensive system monitoring, including metrics, alerts, and silences.

Automate Infrastructure Tasks: Leverage
 Chef,
 Ansible,
 Terraform, and
 GitLab CI/CD
 to automate infrastructure configuration and deployment.

Scripting for Automation: Write scripts using
 Python,
 Power, or
 Bash
 to automate tasks and streamline operations.

Troubleshoot and Resolve Issues: Use
 SRE principles
 to conduct root cause analysis and implement corrective actions for system reliability.

Provision and Configure Cloud Resources: Provision and configure resources using
 Azure,
 GCP, or
 AWS
 via
 CLI
 or APIs.

Documentation and Runbooks: Develop and maintain clear technical documentation, including
 runbooks, application guides, and system configurations.

System Architecture: Plan, design, and implement scalable and redundant system architecture to meet organizational goals.

Required Skills:

Observability Tools: Proficiency with
 Dynatrace,
 Prometheus,
 Thanos,
 Grafana, or similar tools for monitoring and observability.

Infrastructure Automation: Expertise in
 Chef,
 Ansible,
 Terraform, and
 GitLab CI/CD
 to automate infrastructure tasks.

Scripting Languages: Advanced knowledge of
 Python,
 Power, or
 Bash
 for system automation.

Cloud Platforms: Experience in provisioning and configuring cloud resources on
 Azure,
 GCP, or
 AWS.

SRE Practices: Strong understanding of
 root cause analysis,
 troubleshooting, and applying
 SRE principles
 to maintain system reliability.

Documentation: Ability to write detailed and clear
 technical documentation, including
 runbooks
 and
 system configurations.

System Architecture: Understanding of
 scalability
 and
 redundancy
 in system architecture design.

Preferred Skills:

Kubernetes: Familiarity with container orchestration.

Linux Administration: Expertise in
 Linux
 configuration, package management, and troubleshooting.

Networking: Knowledge of
 VPCs,
 Proxies,
 CDNs, and integration into scalable systems.

Storage Systems: Understanding of
 block
 and
 object storage
 configurations.

Other Requirements:

Hacker Earth Assessment: Candidates must pass an assessment to validate skills in
 Automation (Chef, Ansible, Terraform),
 Python, and general
 SRE.

Remote Work: This is a
 remote
 position, and candidates must be able to work from anywhere.

Experience Level: Preference for candidates with a minimum of 5+ years in SRE or related roles.

Keywords: continuous integration continuous deployment information technology 
Looking for SRE (Observability) Engineer (Remote)
machavarapu7013@gmail.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 1

Location: ,