HPC Specialist at Remote, Remote, USA |
Email: [email protected] |
From: AJAY, KK Software Assciates [email protected] Reply to: [email protected] Role name: | System Administrator | Role Description: | HPC System Design and Implementation: Design and deploy HPC clusters, including compute, storage, and networking components. Evaluate and implement new HPC technologies to improve system performance and scalability.System Administration and Maintenance: Manage Linux-based HPC systems, including job schedulers (e.g., Slurm, PBS, or Grid Engine). Monitor system health and resolve performance bottlenecks or failures. Ensure uptime and optimal configuration of HPC resources.Performance Optimization: Fine-tune applications and workloads for optimal performance on HPC systems. Analyze job performance and provide recommendations to users for improvements.Storage and Data Management: Manage large-scale parallel file systems (e.g., Lustre, GPFS, or BeeGFS). Optimize data transfer and storage strategies for high-throughput workloads.User Support and Collaboration: Provide technical support and training to researchers and end users. Collaborate with interdisciplinary teams to understand computational requirements.Security and Compliance: Ensure HPC systems adhere to security best practices and compliance standards. Implement data backup and disaster recovery solutions. | Competencies: | High Performance Computing Architecture | Experience (Years): | 6-8 | Essential Skills: | Job Summary:We are seeking a highly skilled and experienced Senior HPC Specialist to design, implement, and maintain high-performance computing systems and solutions. Candidate will play a critical role in optimizing computational performance, ensuring the reliability of the infrastructure, and supporting advanced computational workloads | Desirable Skills: | HPC System Design and Implementation: Design and deploy HPC clusters, including compute, storage, and networking components. Evaluate and implement new HPC technologies to improve system performance and scalability.System Administration and Maintenance: Manage Linux-based HPC systems, including job schedulers (e.g., Slurm, PBS, or Grid Engine). Monitor system health and resolve performance bottlenecks or failures. Ensure uptime and optimal configuration of HPC resources.Performance Optimization: Fine-tune applications and workloads for optimal performance on HPC systems. Analyze job performance and provide recommendations to users for improvements.Storage and Data Management: Manage large-scale parallel file systems (e.g., Lustre, GPFS, or BeeGFS). Optimize data transfer and storage strategies for high-throughput workloads.User Support and Collaboration: Provide technical support and training to researchers and end users. Collaborate with interdisciplinary teams to understand computational requirements.Security and Compliance: Ensure HPC systems adhere to security best practices and compliance standards. Implement data backup and disaster recovery solutions. | Country: | United States | Branch | City | Location: | TCS - Denver, CO DENVER Denver, CO | Keywords: Colorado HPC Specialist [email protected] |
[email protected] View All |
02:15 AM 30-Jan-25 |