Home

Site Reliability Engineer (SRE) - Remote- Need 15+ Exp at Remote, Remote, USA
Email: [email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2258923&uid=

From:

Laxmivijay,

Procorp Systems

[email protected]

Reply to: [email protected]

Hello

Hope you re doing well!

Job Title: Site Reliability Engineer (SRE) / DevOps Engineer

Remote

Overview:

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) / DevOps Engineer to join our team. In this role, you will be responsible for ensuring the reliability, performance, and scalability of our data infrastructure and applications. You will work closely with development and data teams to automate processes, optimize systems, and maintain a highly available environment. A key component of this role will be expertise in Apache Druid and Apache Airflow.

Responsibilities:

Reliability and Availability:

Ensure high availability and reliability of production systems.

Implement and maintain robust monitoring and alerting systems.

Participate in on-call rotations to respond to incidents and outages.

Conduct post-incident reviews and implement preventative measures.

Automation and Infrastructure as Code (IaC):

Automate infrastructure provisioning, configuration, and deployment using IaC tools (e.g., Terraform, Ansible).

Develop and maintain CI/CD pipelines to streamline software releases.

Optimize and automate data pipelines and workflows.

Apache Druid Management:

Manage and optimize Apache Druid clusters for high performance and scalability.

Troubleshoot Druid performance issues and implement solutions.

Design and implement Druid data ingestion and query optimization strategies.

Apache Airflow Orchestration:

Design, develop, and maintain Airflow DAGs for data orchestration and workflow automation.

Monitor Airflow performance and troubleshoot issues.

Optimize Airflow workflows for efficiency and reliability.

Monitoring and Logging:

Implement and maintain comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).

Analyze metrics and logs to identify performance bottlenecks and potential issues.

Create and maintain dashboards for visualizing system health and performance.

Collaboration and Communication:

Collaborate with development, data, and operations teams to ensure smooth operations.

Communicate effectively with stakeholders regarding system status and incidents.

Document processes and procedures.

Qualifications:

Strong experience in SRE or DevOps roles.

Proficiency in Linux/Unix systems administration.

Experience with cloud platforms (AWS, GCP, Azure).

Strong scripting skills (Python, Bash).

Experience with containerization and orchestration technologies (Docker, Kubernetes).

Deep understanding of Apache Druid and Apache Airflow.

Experience with monitoring and logging tools.

Knowledge of CI/CD pipelines.

Excellent problem-solving and troubleshooting skills.

Strong communication and collaboration skills.

Thanks & Regards

Laxmivijay

US IT Technical Recruiter.

ProCorp Systems Inc

2222 W Spring Creek Pkwy, STE 202,Plano,Texas 75023

Mail:

[email protected]

LinkedIn:

https://www.linkedin.com/in/laxmi-vijay-a5aa47231/

->

https://procorpsystems.co/

Keywords: continuous integration continuous deployment information technology Colorado
Site Reliability Engineer (SRE) - Remote- Need 15+ Exp
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2258923&uid=
[email protected]
View All
06:51 PM 17-Mar-25


To remove this job post send "job_kill 2258923" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 0

Location: ,