Head of Performance Intensive Computing Engineering
Remote
- Reno, Nevada, United States
CIQ OVERVIEW
CIQ believes in helping people do great things. We do this by building strong communities for open-source software, innovating software infrastructure, and building the next generation of performance computing. Our software stack consists of Rocky Linux the CentOS replacement, Apptainer the container solution of choice for HPC, Warewulf a provisioning and cluster management solution, and Fuzzball our next-generation performance computing platform that is a multi-cloud, multi-site, multi-cluster, and multi-node.
If you are interested in an environment built on ownership, diversity of thought, and pushing the limits of what is possible, then we would be interested in you.
POSITION SUMMARY
As the Head of Performance Intensive Computer Engineering, you will be responsible for overseeing the development of our computing offerings including traditional HPC and next-generation computing infrastructures to support GenAI, ML, and compute and data-driven analytics. Additional responsibilities include but are not limited to:
- Implementing the strategic vision and direction for our HPC, GenAI, ML computing platform.
- Driving continuous improvements in business processes, and managing the implications of security and compliance guidelines.
- Building and maintaining strong relationships with leaders, customers, and partners, and participating in technology initiatives to understand current and future architecture and infrastructure needs.
- Responsible for the development of team members, leading by example and fostering an inclusive environment in support of our corporate values.
- Administering department budget, creating, planning, monitoring, reconciling, and directing resources.
- Leading the engineering for all performance-intensive computing initiatives at CIQ.
- Coordinating groups and teams of engineers to create the next-generation global computing infrastructure.
Job Requirements
NEEDED TO SUCCEED
Successful candidates will have team management and leadership experience as well as hands-on architecture design experience with GenAI, ML, and HPC use-cases, workflows, and infrastructure including storage, file system, InfiniBand, security, authentication, and compute architectures. Experience with compute job scheduling, training, learning, and inference. Understanding of computing algorithms and parallelization. Experience using Git to manage shared software configuration code bases. Hands-on experience with cloud-based services (e.g., Azure, AWS, GCP) as well as experience with Linux systems administration, optimization, and debugging. Proven experience with orchestration technologies such as Kubernetes and with container technologies such as Apptainer, Docker, and Podman. Experience with DevOps or DevSecOps methodologies, such as automation and configuration management. Experience configuring and using monitoring systems for cloud-native and HPC infrastructure. A good understanding of fundamental networking concepts and their practical applications. The ability to determine meaningful metrics and usage data for monthly status reports and health dashboards. Strong troubleshooting skills. A friendly, collaborative, humble, honest, and always striving to be better attitude.
EDUCATION AND EXPERIENCE
A minimum of 10 years in leadership roles, managing people, and reporting to and working with VP and C-level executives. At least 5 years combined experience in HPC, GenAI, ML, and other performance-intensive computing environments. A minimum of 5 years experience as an engineer or architect with HPC technologies. At least 3 years experience as an engineer or architect with AI/ML technologies.
#J-18808-Ljbffr