Data Center Operations Systems Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and MIT. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us. 

What You'll Do

  • Ensure new server, storage and network infrastructure is properly racked, labeled, cabled, and configured
  • Document data center layout and network topology in DCIM software
  • Work with supply chain & manufacturing teams to ensure timely deployment of systems and project plans for large-scale deployments 
  • Participate in data center capacity and roadmap planning with sales and customer success teams to allocate floorspace 
  • Assess current and future state data center requirements based on growth plans and technology trends
  • Manage a parts depot inventory and track equipment through the delivery-store-stage-deploy-handoff process in each of our data centers
  • Work closely with HW Support team to ensure data center infrastructure-related support tickets are resolved
  • Work with RMA team to ensure faulty parts are returned and replacements are ordered
  • Create installation standards and documentation for placement, labeling, and cabling to drive consistency and discoverability across all data centers
  • Serve as a subject-matter expert on data center deployments as part of sales engagement for large-scale deployments in our data centers and at customer sites

You

  • Have experience with critical infrastructure systems supporting data centers, such as power distribution, air flow management, environmental monitoring, capacity planning, DCIM software, structured cabling, and cable management
  • Have strong Linux administration experience
  • Have experience in setting up networking appliances (Ethernet and InfiniBand) across multiple data center locations
  • You are action-oriented and have a strong willingness to learn
  • You are willing to travel up to 25% of the time between CA, TX, UT, and GA locations

Nice to have

  • Experience with troubleshooting the following network layers, technologies, and system protocols: TCP/IP, DP/IP, BGP, OSPF, SNMP, SSL, HTTP, FTP, SSH, Syslog, DHCP, DNS, RDP, NETBIOS, IP routing, Ethernet, switched Ethernet, 802.11x, NFS, and VLANs.
  • Experience with working in large-scale distributed data center environments
  • Experience working with auditors to meet all compliance requirements (ISO/SOC)

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 130, and growing fast
  • Our remote workforce, based on role, is across the U.S., with headquarters in San Jose, CA
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends
  • 401k Plan
  • Flexible Paid Time Off Plan that we all actually use

Salary Range Information 

Based on market data and other factors, the salary range for this position is $115,000-$170,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Apply for this job
logo Lambda Infrastructure Full-time 💰 115K - 170K Onsite 📍 Atlanta, GA Apply Now
Your subscription could not be saved. Please try again.
Your subscription has been successful.

Newsletter

Subscribe and stay updated.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Join our newsletter