DevOps or Site Reliability Engineer needed::Remote (USC,GC) at Remote, Remote, USA |
Email: [email protected] |
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=153525&uid= From: Geetanjali, Absolute IT [email protected] Reply to: [email protected] Job Title Location Job Description Site Reliability Engineer/DevOps [Dallas, TX, 75342] Dallas, TX (Candidate needs to travel to Dallas every quarter for PI planning workshops; all expenses will be reimbursed after the travel) Senior Site Reliability Engineer UPDATED SEVERAL MORE QUALIFYING QUESTIONS, AS WELL AS YOU WILL NEED TO ANSWER THESE IN YOUR SUBMISSIONS: Suppliers need to provide response to below questions as well prior to submission: 1. What is the candidates years of experience in defining Error Budgets, setting up KPIs to monitor it, implement strategies and response mechanisms 2. What monitoring & observability tools have you have used in your experience How did you act upon the data provided by these tools 3. Have you worked on creating Infrastructure as a code Which frameworks have you used to achieve it 4. Have you used automation to address repeatable incidents in production 5. Have you contributed to developing code towards feature development 6. What are the activities that you typically do when the systems are reliable & stable 7. Give examples of where you have helped the team to ensure incidents do not repeat in production 8. Have you worked in a server less environment where there are no physical/cloud servers to manage The Senior Site Reliability Engineer is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. Role & Responsibilities: Help build a Site Reliability Engineering culture by sharing best practices, approaches, documentation, and code with other engineering teams Define and setup KPIs to monitor Error Budgets Implement strategies to ensure Error Budgets stay above the defined-acceptance levels Define and implement response mechanisms when Error Budget thresholds are breached Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually; Able to troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot infrastructure and application issues, including development and testing Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation (design, develop and test); Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability; Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency; Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability; Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it; Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure; Design, Develop & Test Terraform based Infrastructure as Code scripts to automate AWS infrastructure setup Develop Typescript, NodeJS based REST/JSON Web Services deployed on AWS. Mandatory Skills: Bachelor's Degree in Computer Science or related; or equivalent combination of education and experience 10+ yrs overall experience in Software Application Development & Engineering 5+ years of SRE experience 3+ yrs experience in AWS services Experience in Typescript, NodeJs and web development technologies Proficient in scripting languages such as Power and/or Python Knowledge of DevOps methodologies and the tools involved such as CI/CD concepts, CI/CD tools (Jenkins, CodePipeline, etc.), automation and configuration tools (Puppet, Ancible, etc) a plus. Kind Regards, Geetanjali Banvi | Absolute IT | Technical Resource Specialist 116 Village Blvd Suite 200 Princeton New Jersey 08540 Direct: (609) 628 0404 Office: 201-228-3009 EXT 112 [email protected] www.absoluting.com Keywords: continuous integration continuous deployment http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=153525&uid= |
[email protected] View All |
04:44 PM 18-Nov-22 |