Need AI Service Hosting Cloud DeVops Engineer, REMOTE at Remote, Remote, USA |
Email: [email protected] |
From: sandeep, arkinfotechspectrum [email protected] Reply to: [email protected] Job title : Al Service Hosting Cloud DevOps Engineer Location : (Remote) Job description : Our client is one of the largest MSOs (Multiple Systems Operator) in the US. We are working with their Al infrastructure team and they building a new team of platform specialists (third party labor) to support and enhance high-performance workloads mance Al services. These are highly technical, hands-on roles focused on customer, application, and platform support of Al-focused As an Al Platform Specialist, this role wil provide application support. The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution. sue resolution. The roles require user knowledge of Kubernetes, virtualization, and cloud native technologies. Each specialist should have a fonus on customer service along with goals of reliability, scalability, and performance. Position's General Duties and Tasks In these roles you will be responsible for: Platform Support & Incident Response Provide Tier 1 & Tier 2 & Tier 2 support for driven applications and workloads Troubleshoot and resolve issues related resolve issues related to Kubernetes deployments, GPU utilization, and service performance Collaborate with Tier 3 teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues. Kubernetes & Cloud Native Operations : Full adoption, creation, and integrations into automated services using Helm, Ansible, Terraform, etc. Deploy, manage, and support containerized workloads preferably Al workloads on Google Anthos-powered kubernetes clusters. Ensure adherence to pod ser to pod security policies, automated rollouts/rullbacks, and Observability & Documentation ethos-powered Kubernetes clsters and best practices for scalable and secure Maintain detailed operational documentation, runbooks, and troubleshooting guides. Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks. Process Improvement & Collaboration : Work cross-functionally with developers, IT teams, and vendo and vendors to ensure seamless deployment and support of Al services. Contribute to CI/CD pipelines, automation, service, and security best practices. Track and communicate work through task management platforms (ServiceNow and Jira). Requirements for this role include: Hybrid Cloud-In-depth knowledge of private (on-premises) and public (GCP & AWS) cloud architectures and services. Kubernetes Expertise-Hands-on experience with deploying and managing containerized workloads in Kubernetes, Technical Support & Troubleshooting-Proven ability to diagnose and resolve customer and platform issues in production environments. Strong Communication & Documentation - Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams. Time Management & Accountability-Ability to work independently, prioritize tasks. ritize tasks, and manage workload effectively. Preferences: - Optional (nice-to-have's) Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private Al Foundation, etc. Exposure to Al coding assistants like Codeium, Copilot, or Tabnine Proficient in development tools like Python, PyTorch , TensorFlow, Jupyter Notebooks, etc. Required schedule availability for this position is Monday-Friday (09:00am to 05:00pm EST). The shift timings can be changed as per client requirements. Additionally, resources may vetro do overtime and work on weekend's basis business requirement. Experience: 10 years in total, 5 years in DevOps Keywords: continuous integration continuous deployment artificial intelligence information technology Alabama Georgia Need AI Service Hosting Cloud DeVops Engineer, REMOTE [email protected] |
[email protected] View All |
11:37 PM 11-Mar-25 |