Previous Job
Previous
Staff Engineer – Distributed System Programming (629206)
Ref No.: 18-07497
Location: Santa Clara, California
Position Type:Contract
Start Date / End Date: 07/02/2018 to 12/28/2018
Description:
About the Team
Client’s System Infrastructure Lab, Santa Clara, CA is looking for highly motivated individuals with experience in the area of deep learning frameworks, distributed architecture and programming, runtime scheduling, and performance tuning to help Client design and build the full-stack software tuning platform.
Responsibilities
As a senior engineer, you’ll work closely with a group of highly talented scientists to
Interact with architect and production team to collect specific requirements on deep learning applications.
Build a flexible and well-maintained distributed framework as a full-stack auto-tuning infrastructure all across the application/algorithm, deep-learning network, operators, and compiler.
Abstract and define domain-specific search space, explore the possibly optimal parameters among the specified space by fine-tuning a set of system knobs (e.g., threading, algorithms, locality, networks, etc.)
Develop a set of specific tuners for different problems on the auto-tuning framework. The problem may span from multiple software stack levels, such as application/algorithm/network/operators/compiler, etc.
Draft design and implementation documents, and project progress reports.
Qualification
PhD or MS in Computer Science or Electrical Engineering, with minimal of 2+ year’s development on system programming.
Self-motivated with excellent teamwork and communication skills.
Strong hands-on ability for building and tuning system.
Familiar with distributed system, cache coherence and multi-threading.
Thorough understanding of computer architecture, compilers, and memory subsystem,
Better to have experience with GPU programming as well as open-source deep learning frameworks, such as Tensorflow, MXNet, etc.
Proficiency in Python, C/C++, CUDA, scripting language development.