Home

Sr Site Reliability Engineer | LOCAL-Fountain Valley, CA (Hybrid) | C2H at Valley, Alabama, USA
Email: [email protected]
Processing description:
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=544606&uid=

Please share resume to

[email protected]

Role:

Senior Site Reliability
Engineer

Location
:

Fountain Valley, CA (Hybrid)

Job Type:

Contract

//

Job Description:

Details of the role:

Sr. Site Reliability Engineer monitors
all aspects of the Connectivity Services connected car platforms. Identifies
anomalies, addressed failures, communicates and escalates with Business Users.
Documents processes and procedures, conducts and documents RCA efforts.

We are looking for not only a passionate
leader, but an effective problem solver.

The ideal candidate will have extensive
knowledge and experience in both Operational Monitoring, Incident Management,
Problem Management and a background in Site Reliability.

The candidate must possess a good
understanding of Problem Management process and IT infrastructure components
(servers, network, storage, middleware/database, application software and data
center facilities) and processes followed in production environments and IT
operations, as well some of the tools used to monitor and control them.

Additionally, the candidate will be
responsible for all incidents reported to the Command Center from initiation
until an acceptable work-around is in place or incident resolution.

Candidates must have experience in, and
not limited to, support operations, escalation management, and critical
incident response. Candidates need to demonstrate superior written and verbal
communication skills. The successful candidate should have a demonstrated track
record of process and operational effectiveness, streamlining and improving
operations, quality control and continuous improvement.

Required Skills:

7+ years SRE specific experience.
Exceptional written and verbal communication skills.

Expertise with Apache WS, Oracle WS,
WebLogic, Mongo DB, No SQL as an IM-SRE Technical Team Lead, you will be
responsible for:

7+ years of technical operations/support
experience with proven knowledge of and experience monitoring production
environments

7+years of experience with event
monitoring and/or incident/problem management, to include setting-up monitoring
thresholds and views.

5+ years of broad technical experience
with proven expertise in a majority of the following areas: servers, networks,
hardware, operating systems (Windows, Linux, Kubernetes), virtualization
software, middleware and related base build infrastructure and software.

Experience and subject matter expertise
in the web, distributed computing or cloud environment, as well as mainframe
experience is a plus.

Ensuring incident response procedures
are in place to mitigate interruptions and impact.

Enforcing Incident Management policy and
processes; ensure participants adhere to standards and procedures for

processes, documentation, and
communication.

Reviewing and Analyzing Incident
Management metrics and report metrics to leadership with thoughtful analysis
and recommendations

Monitoring effectiveness of Incident
Management and drive continuous improvement.

Monitoring functionality of Incident
Management systems and applications

Ensuring all appropriate groups are
working on restoring service in a timely manner.

Notify, escalate, and communicate to
business participants, leadership, and impacted stakeholders the existence of
service impacts, resolution, expectations, and cause.

Maintaining and managing 24x7x365
coverage of incident response team

Ensuring timely and accurate handoff of
problem and outage records

Training stakeholders in incident
management policies, processes, and procedures to ensure commonality and
consistency.

Addressing shortfalls to service levels
and identifying and correct process gaps.

Coordinating all continuous improvement
activities

Analyzing data, predicting trends and
themes

Audit the completeness and accuracy of
all incidents using the ServiceNow Platform, ensuring all records are complete
and accurate.

Proactively detect and prevent future
problems/incidents and initiate the Problem Management process to allow quicker
diagnosis and resolution.

Collaborating with subject matter
experts to refine operating processes and procedures to deliver and maintain
service more efficiently.

Ensuring problem progress through the
Problem Management process in a timely and prioritized manner

Ensuring problem management information
reflects accurate errors and is complete.

Maintaining inventory of problems under
analysis and their current progress and status

Analyzing and coordinate
inter-organization responses and troubleshooting activities arising from
critical/high incidents.

Managing and maintaining information
stored in the problem database.

Owning monitoring and incident/problem
reporting for status reports to management

Overseeing scheduling root cause
analysis meetings and lead all RCA calls.

Bachelors degree in computer science,
IT, MIS, or related field; or equivalent work experience.

Good to have skills:

Experience developing in multiple
enterprise level programming and scripting languages such as: Java, JavaScript,
.Net, Python, C#, Cobol, Go

Experience with large SQL & No SQL
databases ie Oracle, Mongo, MYSQL, PSQL

Experience with various operating
systems such as, Linux, Unix, Windows,

Experience with varying code
repositories, auto deployments, branching with tools such as Gitlab, Bitbucket,
Subversion

Experience with monitoring tools such as
Splunk, Dynatrace, Elastic, New Relic, SolarWinds

DevOps and/or DevSecOps experience

Experience with enterprise level CICD
Tools such as Ansible, Jenkins,

Experience with IT service management
tools such as Service Now, Atlassian, BMC

Team management, mentoring, teaching,
coaching of staff. Documentation (technical writing, data modeling,
wireframe, process flow)

Ability to communicate to a broad range
of technical and leadership audience

Act as identified Senior, Single of Point
of Contact with client management.

Define and enhance Root Cause Analysis
processes, documentation, processes and procedures ensure completeness,
accuracy, communication.

--

Keywords: csharp database information technology golang card California
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=544606&uid=
[email protected]
View All
08:26 PM 18-Aug-23


To remove this job post send "job_kill 544606" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 8

Location: Fountain Valley, California