Network Engineer (LAN & WAN)
As a Networking Engineer focused on WAN and LAN, you will play a critical role in developing, managing, and optimizing the front end network components of OpenAI’s supercomputing infrastructure.
Your expertise will ensure that our networks are fast, reliable, and scalable to meet the demands of training frontier AI models.
This includes managing both local (LAN) and long-distance (WAN) connectivity across our data centers, optimizing performance, and ensuring seamless communication between compute nodes and clusters. Finally this also includes writing code to instrument and observe the network.
Our team primarily uses Python and some Rust, so familiarity with or interest in working with this stack is essential.
This role is based in San Francisco, CA, with a hybrid work model of 3 days per week in the office. Relocation assistance is available.
In this role, you will:
- Design, manage, and optimize WAN and LAN infrastructure for OpenAI’s supercomputers.
- Develop and maintain data collection and monitoring systems to ensure network visibility and performance.
- Troubleshoot and resolve network issues, such as TCP/IP, BGP, and physical.
- Automate network issue detection and resolution to reduce operational overhead.
- Work closely with hardware and systems engineers to meet the performance demands of distributed AI training workloads.
You might thrive in this role if you:
- Have 5+ years of experience in networking or related infrastructure roles.
- Possess strong expertise in networking technologies, protocols, and design principles.
- Have hands-on experience with troubleshooting complex networking issues, including both LAN and WAN environments.
- You deeply understand how to set up TCP/IP networks from scratch (e.g., BGP, ECMP routing, etc.)
- Deep understanding of network protocols such as TCP/IP, BGP, & VLAN.
- Familiarity with optical connectors and optical circuit switches (OCS)
- Understand advanced concepts in routing, forwarding, and network management systems.
- Have experience with telemetry, traffic engineering, and congestion management to optimize network performance.
- Are skilled in collaborating across teams, combining technical expertise with excellent problem-solving and communication abilities.
- Ownership of problems end-to-end and maintain a commitment to continuous learning to effectively solve challenges
- Are familiar with InfiniBand, RoCE, or RDMA in HPC (High-Performance Computing) or similar environments.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.
OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement
For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Apply for this jobOther AI Jobs like this
Network Security Engineer
OpenAI