Platform ML - Numerics

About the Team

The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference. Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI. We frequently collaborate with other teams to speed up the development of new capabilities.

About the Role

As a Numerics Tech Lead you will be the expert on the numerical properties of our custom low-precision kernels used in flagship training runs.

You will own and extend our suite of unit and integration tests for approximate numerical correctness, as well as our suite of tools for ensuring numerical correctness of live training runs. During flagship training runs you will collaborate closely with other leads on end to end debugging of numerical problems and help identify their root causes.

We’re looking for people who love understanding things at a very deep level, who are detail oriented, and who are able to reason across all layers of our stack, from the ML algorithm all the way down to the hardware. You will contribute significantly to the success of our flagship training runs.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

Become the main expert on the numerical properties of our low precision GPU kernels.
Develop and maintain an extensive test suite for numerical correctness
Develop and maintain a suite of tools used to ascertain the correctness and numerical health of our flagship training runs
Work with researchers to enable them to develop the next generation of models

You might thrive in this role if you:

Love understanding things deeply
Have incredible attention to detail
Have experience with low-precision numerics
Love understanding and debugging systems across all layers of abstraction
Have strong software engineering skills and are proficient in Python

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Apply for this job

Platform ML - Numerics

Other AI Jobs like this

Platform ML Engineering Manager, Training

TLM, Developer Productivity

Senior ML Platform Engineer

Engineering

Data

Other Roles

Locations