Home

Role- AI Platform Architect-SME-REMOTE at Remote, Remote, USA
Email: suresh@gacsol.com
https://short-link.me/15H5b
https://jobs.nvoids.com/job_details.jsp?id=2252595&uid=
From:

suresh,

Gac Solutions

suresh@gacsol.com

Reply to:   suresh@gacsol.com

Hi,

My name is Suresh from GAC Solutions, and Im winged to let you know regarding a job opportunity on contract  

Role: AI Platform Architect/ SME

Location: REMOTE

Job description:

Required Skills & Experience

Hybrid Cloud In-depth knowledge of private (on-premises) and public (GCP & AWS) cloud architectures and services.

AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.

AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).

Kubernetes Expertise Hands-on experience with deploying and managing containerized workloads in Kubernetes.

Technical Support & Troubleshooting Proven ability to diagnose and resolve customer and platform issues in production environments.

Strong Communication & Documentation Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams.

Time Management & Accountability Ability to work independently, prioritize tasks, and manage workload effectively.

Preferred Qualifications

Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.

Exposure to AI coding assistants like Codeium, Copilot, or Tabnine.

Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.

Key Responsibilities

As an AI Platform Specialist, these roles will provide application and GPU support. The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution. The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services. Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.

              Platform Support & Incident Response

o             Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.

o             Troubleshoot and resolve issues related to Kubernetes deployments, GPU utilization, and service performance.

o             Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues.

              Kubernetes & Cloud-Native Operations

o             Full adoption, creation, and integrations into automated services using Helm, Ansible, Terraform, etc.

o             Deploy, manage, and support containerized AI workloads on Google Anthos-powered Kubernetes clusters.

o             Ensure adherence to pod security policies, automated rollouts/rollbacks, and best practices for scalable and secure Kubernetes environments.

              GPU Infrastructure & AI Services Management

o             Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.

o             Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).

              Observability & Documentation

o             Maintain detailed operational documentation, runbooks, and troubleshooting guides.

o             Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.

              Process Improvement & Collaboration

o             Work cross-functionally with developers, IT teams, and vendors to ensure seamless deployment and support of AI services.

o             Contribute to CI/CD pipelines, automation, service, and security best practices.

o             Track and communicate work through task management platforms (ServiceNow and Jira).

Thanks

Suresh Jami

Trainee Recruiter

GAC Solutions Inc.

www.gacsol.com

Experts in Digitalization and Engineering - Enterprise 4.0

Keywords: continuous integration continuous deployment artificial intelligence machine learning information technology
Role- AI Platform Architect-SME-REMOTE
suresh@gacsol.com
https://short-link.me/15H5b
https://jobs.nvoids.com/job_details.jsp?id=2252595&uid=
suresh@gacsol.com
View All
04:41 AM 13-Mar-25


To remove this job post send "job_kill 2252595" as subject from suresh@gacsol.com to usjobs@nvoids.com. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to suresh@gacsol.com -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at me@nvoids.com


Time Taken: 0

Location: ,