Home

Lead Site Reliability Engineer with Java :: San Antonio, Texas at San Antonio, Texas, USA
Email: prem@brightsol.ai
https://rb.gy/r1ud0k
https://jobs.nvoids.com/job_details.jsp?id=2337308&uid=
From:

Prem,

brightsol.ai

prem@brightsol.ai

Reply to:   prem@brightsol.ai

Role: Lead Site Reliability Engineer with Java

Location: San Antonio, Texas

Project Tenure: 18 Month

Customer: Banking Client

Relevant Experience: 14+ Years

Job Description & Key Responsibilities:

As a Lead Site Reliability Engineer (SRE), you will leverage your extensive experience in SRE practices to

maintain and enhance the reliability, performance, and scalability of mission-critical systems. You will

play a crucial role in ensuring the continuous availability and optimal functioning of our services.

Key Responsibilities:

Senior-Level SRE Expertise: Apply your deep understanding of SRE principles to lead efforts in

improving system reliability and operational efficiency.

Incident Management: Provide expert-level support during incidents, ensuring swift resolution

with minimal service disruption. Lead post-incident reviews to drive continuous improvement.

Monitoring & Alerting: Design, implement, and optimize monitoring, alerting, and incident

response processes. Ensure the effectiveness of these systems to proactively address potential

issues.

Automation: Drive the automation of manual processes to enhance operational efficiency,

reduce human error, and increase overall system resilience.

CI/CD Pipeline Management: Develop, maintain, and improve automated CI/CD pipelines using

tools such as GitLab CI/CD and Jenkins, ensuring seamless and reliable deployment processes.

Cross-Functional Collaboration: Work closely with cross-functional teams to ensure the

reliability, performance, and scalability of our infrastructure. Foster a culture of collaboration

and knowledge sharing.

Support Across Time Zones: Provide support across all U.S. time zones, with the flexibility to

work weekends, rotational shifts, and overtime as required to maintain service continuity.

Required Skills & Qualifications:

Java Programming: Advanced proficiency in Java, with a deep understanding of contemporary

software development practices.

Kubernetes & Containerization: Extensive hands-on experience with Kubernetes, including

containerization technologies like Docker and Kubernetes storage solutions such as Portworx.

Linux/Unix Systems: Strong command of Linux/Unix operating systems and Scripting

(BASH), with a focus on system reliability and automation.

Functional Programming: Proficiency in functional programming languages such as Prolog,

Haskell, and OCaml.

Scripting & Automation: Experience with Python or Go, particularly in the context of scripting

and automation tasks.

Virtualization: In-depth knowledge of VMware and other virtualization platforms, with a focus

on optimizing virtual environments for reliability and performance.

Streaming Technologies: Expertise with Kafka Stream Generator, KSQLDB, cluster federation, and

Spark Streams, including experience in managing and optimizing streaming data architectures.

Service Mesh & Networking: Familiarity with Istio and Anthos Service Mesh, with the ability to

manage and optimize service meshes for complex environments.

Performance Monitoring & Debugging: Proficiency in using EBPF (Extended Berkeley Packet

Filter) for performance monitoring and debugging.

Monitoring & Logging Tools: Experience with industry-standard monitoring and logging tools

such as Splunk, Prometheus, Datadog, and Kiali.

Load Balancing: Familiarity with Nginx Controller and Seesaw for effective load balancing and

traffic management.

Infrastructure-as-Code (IaC): Competence in using Terraform for managing cloud infrastructure,

ensuring consistency and scalability across environments.

Additional Requirements:

Flexibility: Willingness to work weekends, rotational shifts, and provide 24/7 support as

necessary to maintain service reliability and meet project deadlines.

Certifications Required:

o Kubernetes

o Azure

Thanks,

Prem Kusuma

Keywords: continuous integration continuous deployment artificial intelligence golang
Lead Site Reliability Engineer with Java :: San Antonio, Texas
prem@brightsol.ai
https://rb.gy/r1ud0k
https://jobs.nvoids.com/job_details.jsp?id=2337308&uid=
prem@brightsol.ai
View All
10:32 PM 11-Apr-25


To remove this job post send "job_kill 2337308" as subject from prem@brightsol.ai to usjobs@nvoids.com. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to prem@brightsol.ai -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at me@nvoids.com


Time Taken: 8

Location: San Antonio, Texas