Job Description
We are seeking a senior distributed machine learning (ML) research developer to join our team working on a novel AI safety agenda. In this role, you will work closely with ML research scientists to solve difficult training and inference problems using very large models.
Key Responsibilities
- Collaborate with researchers to accelerate research, model training and inference, and facilitate the use of large-scale models in distributed computing environments.
- Investigate performance bottlenecks, profile research experiment code, debug reported issues, and optimize the utilization of computing resources.
- Develop tools and libraries to simplify and orchestrate the use of distributed computing resources for research experiments.
- Establish, document, and maintain best practices for large-scale, distributed ML model development workflows.
Skills and Qualifications
- A degree in a relevant computer science f...