Home

DevOps or Site Reliability Engineer needed::Remote (USC,GC) at Remote, Remote, USA
Email: [email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=153525&uid=

From:
Geetanjali,
Absolute IT
[email protected]
Reply to:   [email protected]

Job Title
Location
Job Description

Site Reliability Engineer/DevOps
[Dallas, TX, 75342]

 Dallas, TX (Candidate needs to travel to Dallas every quarter for PI planning workshops; all expenses will be reimbursed after the travel)

Senior Site Reliability Engineer

UPDATED SEVERAL MORE QUALIFYING QUESTIONS, AS WELL AS YOU WILL NEED TO ANSWER THESE IN YOUR SUBMISSIONS:
Suppliers need to provide response to below questions as well prior to submission:
1. What is the candidates years of experience in defining Error Budgets, setting up KPIs to monitor it, implement strategies and response mechanisms
2. What monitoring & observability tools have you have used in your experience How did you act upon the data provided by these tools
3. Have you worked on creating Infrastructure as a code Which frameworks have you used to achieve it
4. Have you used automation to address repeatable incidents in production
5. Have you contributed to developing code towards feature development
6. What are the activities that you typically do when the systems are reliable & stable
7. Give examples of where you have helped the team to ensure incidents do not repeat in production
8. Have you worked in a server less environment where there are no physical/cloud servers to manage

The Senior Site Reliability Engineer is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment.

Role & Responsibilities:

Help build a Site Reliability Engineering culture by sharing best practices, approaches, documentation, and code with other engineering teams

Define and setup KPIs to monitor Error Budgets
Implement strategies to ensure Error Budgets stay above the defined-acceptance levels
Define and implement response mechanisms when Error Budget thresholds are breached

Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually;
Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot infrastructure and application issues, including development and testing
Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation (design, develop and test);
Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability;
Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency;
Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability;
Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it;
Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure;
Design, Develop & Test Terraform based Infrastructure as Code scripts to automate AWS infrastructure setup
Develop Typescript, NodeJS based REST/JSON Web Services deployed on AWS.

Mandatory Skills:

Bachelor's Degree in Computer Science or related; or equivalent combination of education and experience
10+ yrs overall experience in Software Application Development & Engineering
5+ years of SRE experience
3+ yrs experience in AWS services
Experience in Typescript, NodeJs and web development technologies
Proficient in scripting languages such as Power and/or Python
Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), automation and configuration tools (Puppet, Ancible, etc) a plus.

Kind Regards,
Geetanjali Banvi | Absolute IT | Technical Resource Specialist
116 Village Blvd Suite 200  Princeton New Jersey  08540
Direct: (609) 628 0404
Office: 201-228-3009 EXT 112
[email protected]
www.absoluting.com

Keywords: continuous integration continuous deployment
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=153525&uid=
[email protected]
View All
04:44 PM 18-Nov-22


To remove this job post send "job_kill 153525" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 2

Location: ,