Job Details

Home

DevOps & Site Reliability Engineer - Atlanta, GA , Onsite Day1 - Local Please at Atlanta, Georgia, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1646094&uid=

Role : DevOps & Site Reliability Engineer

Location : Atlanta, GA , Onsite Day1 - Local Please

Duration 6-12 Months

Interview Phone And Skype

Description:

Seeking a Site Reliability Engineer Consultant

Our client is on a journey to becoming the best IT organization in the airline
industry, a journey of transformation. They are changing the way we do business
from top to bottom as we strive to create meaningful and innovative solutions
and are looking for team members to help us realize our vision.

Responsibilities:

Manage and optimize data
streaming and API components in OpenShift Onpremise and AWS

Proactively review the
application's APIs and processes to identify opportunities to optimize the
response times for various application components

Automate various types
of testing including data quality checks, automate delivery to production
and automate deployment for production

Develop integrations
between the application in Onpremise and AWS and our third-party tools
(ServiceNow, VersionOne, Sumo)

Work with teams to
create SLI/SLO's

Actively monitor and
lead troubleshooting of degraded performance and hard to define issues for
the platform applications, develop the solution and document artifacts in
the back log from root cause analysis

Evolve the cloud
infrastructure ecosystem for our application suite by experimenting with
emerging technologies and completing prototypes to understand
benefit

Design and develop CI/CD
pipeline to deploy various application artifacts, including APIs and Data
Process Jobs

Analyze, design and
develop the artifacts to configure the monitoring and alerting metrics so
the support engineers can proactively and timely validate, troubleshoot
and resolve the issues

Maintain data integrity
and access control by using AWS security tools and services such as HSM,
IAM, etc.

Understand and develop
tools to monitor AWS billing for the services, generate cost related
reports and help develop and implement cost optimization strategies

Work with enterprise
security architects to design and implement data security tools, measures,
data encryption, key management; design and develop solutions to address
the security vulnerabilities discovered by internal security audit team,
as well as by the vendors, security community, etc.; design and develop
solutions for support team to regularly scan and review to fix security
issues

Regularly and
proactively monitor and analyze the capacity and performance of the
platform, work with architecture team to design and implement elastic
infrastructure to accommodate the irregular burst of user traffic/requests

Work with architecture
team to develop backup strategy and implement the backup solution for
critical data and application components for service restoration and
disaster recovery purpose

Work with architecture,
infrastructure, and application teams to provide input on continuous
improvement on the design, performance and security enhancement

Implement, improve
monitoring, alerting, and logging solutions to detect and respond to
incidents

Collaborate closely with
development team to deploy applications and services and ensure they meet
reliability and performance standards

Automate deployment,
configuration management, and troubleshooting processes to streamline
operations

Participate in on-call
rotation and triage production incidents, lead RCAs, and implement
preventive actions

Be at the forefront of
Cloud and Big Data technology

Establish yourself as a
technical leader by exposing yourself to a broad range of industry leading
technologies that will help to drive acceleration

Support highly
available, business critical applications

Serve as the escalation
point for complex and hard to define issues in both on premise and AWS
environments

Requirements:

4 - 6 years of
experience

BS degree in Computer
Science or a related technical field or equivalent practical experience

3+ years of related
DevOps, SysOps engineering experience with focus on major cloud platforms
(AWS preferred)

2+ years of application
development experience including data streaming, deploying/monitoring high
availability critical application components

1+ Years in Site
Reliability Engineering organization preferred

Deep understanding of
AWS services (Lambda, S3, SQS, IAM, Route 53, etc.) and proficiency in
infrastructure as code (e.g., Terraform, CloudFormation)

Hands-on experience with
monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana, or
similar for application performance monitoring and alerting

Proficiency in scripting
and automation (e.g., Python, Bash) to build and maintain deployment pipelines
and infrastructure

Strong analytical and
troubleshooting skills to diagnose and resolve complex infrastructure and
application, data issues

Experience with
containerization (Docker, Kubernetes) and serverless architecture (AWS
Lambda)

required skills:

Deep understanding of
the operations of AWS cloud platforms

Well versed in the
automation, scripting, monitoring, including use of tools from the major
cloud platforms, including but not limited to OpenShift Cloud Formation,
Terraform, Ansible, , Python

Significant technical
knowledge with infrastructure layers, including but not limited to: Linux
OS, major virtualization platforms, Traditional and software defined
network, Load Balancers,

firewall, API tools,
element/performance/intelligent monitoring tools, storage, backup
strategy, etc.

Significant knowledge
and experience in end-to-end operations for enterprise systems and
applications, including driving issue resolution for mission critical
systems

Experience working to
automate, operationalize and improve the Development/QA using CI/CD tools
(Gitlab, GitHub, Jenkins, Maven, Gradle, Nexus)

Working experience with
Software Release Management

Thanks & Regard

Ankush Vikal

Sr Technical Recruiter

E-Mail
:

[email protected]

--

Keywords: continuous integration continuous deployment quality analyst sthree information technology card Georgia
DevOps & Site Reliability Engineer - Atlanta, GA , Onsite Day1 - Local Please
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1646094&uid=

[email protected]
View All

08:42 PM 09-Aug-24

To remove this job post send "job_kill 1646094" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 6

Location: Atlanta, Georgia