Home

Opportunity as Site Reliability Engineer @ Plano, TX at Plano, Texas, USA
Email: [email protected]
Hello,

This
is Akilesh from ITech US inc. Given below are the details of the position with one of our clients and was wondering if you would be interested or can recommend someone who would be interested in this job.

Position:

Site
Reliability Engineer

Location:

Plano,
TX (Onsite) Need only Locals to TX

Duration:
12+ Months Contract

Job
Description:

This
SRE role will primarily involve learning GPU clusters, assisting in bringing up these systems, and developing automation to keep them operational, as well as working with various other DC GPU teams to incorporate requirements and address any issues on the
systems- Specific responsibilities

working
with the platform engineering team to develop an automate management of an infrastructure control panel unemployment system for GPU clusters.

working
with the release engineering team to automate the application of updates and system configuration management tools

resolution
of problem tickets reported by internal and external customers for GPU cluster systems.

develop
and enhance internal and 3rd party network and cluster management tools, applications, and processes that enable internal teams and clusters to build, test, optimize high performance networks supporting large scale GPU clusters

assist
in developing these software ecosystems needed for at scale cluster operations providing cluster as a service for internal and customer access systems.

This
responsibility includes some involvement with raking stack data center operations, add skill software install and configuration management, and add scale system provisioning helping to build and operate an on Prem cloud service for internal stakeholders that
form a model for customer adoption

helping
to create an enterprise class operational model for internal cluster systems that provide or reliable, secure, automated infrastructure for rapid response to changing requirements, efficient use of assets, and a reference template for customer adoption

participate
in a strong customer centric culture focused on meeting commitments

10
+ years experience in high performance networks, platform hardware, firmware, and system management solutions at scale

strong
Linux admin knowledge and skills around installation configuration package management and system management across multiple OS distributions. Related skill in system performance tuning at user and kernel mode is a plus

experience
with virtualization and containerization including systems like KVM, docker, podman, open shift, Kubernetes

strong
experience with system automation and configuration management at scale using tools like ansible salt, chef, puppet, bash, Python

experience
working with dev teams developing and maintaining our CI CD pipeline development environment

experience
using common industry tools to fix software issues and automate operational processes

Strong
networking knowledge

Education:

At least a bachelors degree (or equivalent experience) in Computer Science,

Software/Electronics
Engineering, Information Systems, or closely related field is required.

----

Thanks & Regards

Akilesh Kumar

iTech US Inc

Email :

[email protected]

Go
Green! Please do not print this e-mail unless necessary

--

Keywords: continuous integration continuous deployment information technology golang Texas
Opportunity as Site Reliability Engineer @ Plano, TX
[email protected]
[email protected]
View All
10:15 PM 21-Jan-25


To remove this job post send "job_kill 2098513" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 0

Location: ,