| Sr Site Reliability Engineer | LOCAL-Fountain Valley, CA (Hybrid) | C2H at Valley, Alabama, USA |
| Email: [email protected] |
|
Processing description: http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=544606&uid= Please share resume to [email protected] Role: Senior Site Reliability Engineer Location : Fountain Valley, CA (Hybrid) Job Type: Contract // Job Description: Details of the role: Sr. Site Reliability Engineer monitors all aspects of the Connectivity Services connected car platforms. Identifies anomalies, addressed failures, communicates and escalates with Business Users. Documents processes and procedures, conducts and documents RCA efforts. We are looking for not only a passionate leader, but an effective problem solver. The ideal candidate will have extensive knowledge and experience in both Operational Monitoring, Incident Management, Problem Management and a background in Site Reliability. The candidate must possess a good understanding of Problem Management process and IT infrastructure components (servers, network, storage, middleware/database, application software and data center facilities) and processes followed in production environments and IT operations, as well some of the tools used to monitor and control them. Additionally, the candidate will be responsible for all incidents reported to the Command Center from initiation until an acceptable work-around is in place or incident resolution. Candidates must have experience in, and not limited to, support operations, escalation management, and critical incident response. Candidates need to demonstrate superior written and verbal communication skills. The successful candidate should have a demonstrated track record of process and operational effectiveness, streamlining and improving operations, quality control and continuous improvement. Required Skills: 7+ years SRE specific experience. Exceptional written and verbal communication skills. Expertise with Apache WS, Oracle WS, WebLogic, Mongo DB, No SQL as an IM-SRE Technical Team Lead, you will be responsible for: 7+ years of technical operations/support experience with proven knowledge of and experience monitoring production environments 7+years of experience with event monitoring and/or incident/problem management, to include setting-up monitoring thresholds and views. 5+ years of broad technical experience with proven expertise in a majority of the following areas: servers, networks, hardware, operating systems (Windows, Linux, Kubernetes), virtualization software, middleware and related base build infrastructure and software. Experience and subject matter expertise in the web, distributed computing or cloud environment, as well as mainframe experience is a plus. Ensuring incident response procedures are in place to mitigate interruptions and impact. Enforcing Incident Management policy and processes; ensure participants adhere to standards and procedures for processes, documentation, and communication. Reviewing and Analyzing Incident Management metrics and report metrics to leadership with thoughtful analysis and recommendations Monitoring effectiveness of Incident Management and drive continuous improvement. Monitoring functionality of Incident Management systems and applications Ensuring all appropriate groups are working on restoring service in a timely manner. Notify, escalate, and communicate to business participants, leadership, and impacted stakeholders the existence of service impacts, resolution, expectations, and cause. Maintaining and managing 24x7x365 coverage of incident response team Ensuring timely and accurate handoff of problem and outage records Training stakeholders in incident management policies, processes, and procedures to ensure commonality and consistency. Addressing shortfalls to service levels and identifying and correct process gaps. Coordinating all continuous improvement activities Analyzing data, predicting trends and themes Audit the completeness and accuracy of all incidents using the ServiceNow Platform, ensuring all records are complete and accurate. Proactively detect and prevent future problems/incidents and initiate the Problem Management process to allow quicker diagnosis and resolution. Collaborating with subject matter experts to refine operating processes and procedures to deliver and maintain service more efficiently. Ensuring problem progress through the Problem Management process in a timely and prioritized manner Ensuring problem management information reflects accurate errors and is complete. Maintaining inventory of problems under analysis and their current progress and status Analyzing and coordinate inter-organization responses and troubleshooting activities arising from critical/high incidents. Managing and maintaining information stored in the problem database. Owning monitoring and incident/problem reporting for status reports to management Overseeing scheduling root cause analysis meetings and lead all RCA calls. Bachelors degree in computer science, IT, MIS, or related field; or equivalent work experience. Good to have skills: Experience developing in multiple enterprise level programming and scripting languages such as: Java, JavaScript, .Net, Python, C#, Cobol, Go Experience with large SQL & No SQL databases ie Oracle, Mongo, MYSQL, PSQL Experience with various operating systems such as, Linux, Unix, Windows, Experience with varying code repositories, auto deployments, branching with tools such as Gitlab, Bitbucket, Subversion Experience with monitoring tools such as Splunk, Dynatrace, Elastic, New Relic, SolarWinds DevOps and/or DevSecOps experience Experience with enterprise level CICD Tools such as Ansible, Jenkins, Experience with IT service management tools such as Service Now, Atlassian, BMC Team management, mentoring, teaching, coaching of staff. Documentation (technical writing, data modeling, wireframe, process flow) Ability to communicate to a broad range of technical and leadership audience Act as identified Senior, Single of Point of Contact with client management. Define and enhance Root Cause Analysis processes, documentation, processes and procedures ensure completeness, accuracy, communication. -- Keywords: csharp database information technology golang card California http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=544606&uid= |
| [email protected] View All |
| 08:26 PM 18-Aug-23 |