Looking for SRE (Observability) Engineer (Remote) at Remote, Remote, USA |
Email: [email protected] |
Hi, Job Title: SRE (Observability) Engineer Duration: 3 months (with potential extension) Location: Remote Job Overview: We are seeking a highly skilled SRE (Observability) Engineer to join our team, with a strong focus on modern observability practices and tools. The ideal candidate will have hands-on experience in provisioning, configuring, and developing infrastructure solutions with an emphasis on automation, scalability, and reliability. This role blends development, system architecture, and troubleshooting responsibilities, providing opportunities to influence the evolution of our infrastructure. The position will be remote and will require passing a Hacker Earth Assessment to demonstrate proficiency in Automation, Python, and general SRE skills. Responsibilities: Design and Implement Observability Solutions: Use tools such as Dynatrace, Prometheus, Thanos, or Grafana to create comprehensive system monitoring, including metrics, alerts, and silences. Automate Infrastructure Tasks: Leverage Chef, Ansible, Terraform, and GitLab CI/CD to automate infrastructure configuration and deployment. Scripting for Automation: Write scripts using Python, Power, or Bash to automate tasks and streamline operations. Troubleshoot and Resolve Issues: Use SRE principles to conduct root cause analysis and implement corrective actions for system reliability. Provision and Configure Cloud Resources: Provision and configure resources using Azure, GCP, or AWS via CLI or APIs. Documentation and Runbooks: Develop and maintain clear technical documentation, including runbooks, application guides, and system configurations. System Architecture: Plan, design, and implement scalable and redundant system architecture to meet organizational goals. Required Skills: Observability Tools: Proficiency with Dynatrace, Prometheus, Thanos, Grafana, or similar tools for monitoring and observability. Infrastructure Automation: Expertise in Chef, Ansible, Terraform, and GitLab CI/CD to automate infrastructure tasks. Scripting Languages: Advanced knowledge of Python, Power, or Bash for system automation. Cloud Platforms: Experience in provisioning and configuring cloud resources on Azure, GCP, or AWS. SRE Practices: Strong understanding of root cause analysis, troubleshooting, and applying SRE principles to maintain system reliability. Documentation: Ability to write detailed and clear technical documentation, including runbooks and system configurations. System Architecture: Understanding of scalability and redundancy in system architecture design. Preferred Skills: Kubernetes: Familiarity with container orchestration. Linux Administration: Expertise in Linux configuration, package management, and troubleshooting. Networking: Knowledge of VPCs, Proxies, CDNs, and integration into scalable systems. Storage Systems: Understanding of block and object storage configurations. Other Requirements: Hacker Earth Assessment: Candidates must pass an assessment to validate skills in Automation (Chef, Ansible, Terraform), Python, and general SRE. Remote Work: This is a remote position, and candidates must be able to work from anywhere. Experience Level: Preference for candidates with a minimum of 5+ years in SRE or related roles. -- Keywords: continuous integration continuous deployment information technology Looking for SRE (Observability) Engineer (Remote) [email protected] |
[email protected] View All |
09:28 PM 02-Dec-24 |