Previous Job
Previous
HPC engineer
Ref No.: 17-00157

 
  • Researching, testing, recommending, implementing, and maintaining large-scale, resilient, distributed systems
  • Designing and maintaining a multi-petabyte distributed storage system
  • Optimizing resource utilization and job scheduling
  • Analyzing performance issues at scale
  • Troubleshooting node-level issues, such as kernel panics and system hangs
  • Documenting architecture and procedures for users and other members of the Systems team
Qualifications
If you have not used the following commands, please do not apply: vmstattopunameps, gitmakerpmping, tcpdump/wireshark.
  • At least 5 years of experience in Linux administration in a financial services or research background
  • Hands-on knowledge of distributed filesystems, such as, GPFS, Lustre and object storage
  • Extensive experience with HPC or cloud scheduling, such as, GridEngine, HTCondor, SLURM, Mesos and Nomad
  • Experience with configuration management