Platform ML - Numerics

About the Team

The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models.  We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference.  Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI.  We frequently collaborate with other teams to speed up the development of new capabilities.

About the Role

As a Numerics Tech Lead you will be the expert on the numerical properties of our custom low-precision kernels used in flagship training runs.

You will own and extend our suite of unit and integration tests for approximate numerical correctness, as well as our suite of tools for ensuring numerical correctness of live training runs. During flagship training runs you will collaborate closely with other leads on end to end debugging of numerical problems and help identify their root causes.

We’re looking for people who love understanding things at a very deep level, who are detail oriented, and who are able to reason across all layers of our stack, from the ML algorithm all the way down to the hardware. You will contribute significantly to the success of our flagship training runs.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

  • Become the main expert on the numerical properties of our low precision GPU kernels.
  • Develop and maintain an extensive test suite for numerical correctness
  • Develop and maintain a suite of tools used to ascertain the correctness and numerical health of our flagship training runs
  • Work with researchers to enable them to develop the next generation of models

You might thrive in this role if you:

  • Love understanding things deeply
  • Have incredible attention to detail
  • Have experience with low-precision numerics 
  • Love understanding and debugging systems across all layers of abstraction
  • Have strong software engineering skills and are proficient in Python

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. 

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. 

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Apply for this job
logo OpenAI Tech Lead FullTime On-site 📍 San Francisco Apply Now
Your subscription could not be saved. Please try again.
Your subscription has been successful.

Newsletter

Subscribe and stay updated.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Join our newsletter