Software Engineer — Data Infrastructure
We’re on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to achieving differentiation, high performance, and production-ready systems. We work with some of the world’s largest organizations to empower scientists, engineers, financial experts, product creators, journalists, and more to build custom AI with their data faster than ever before. Excited to help us redefine how AI is built? Apply to be the newest Snorkeler!
As a Software Engineer on our Enterprise & Data Infra team, you’ll focus on the Data Infra pillar, designing and building scalable services, APIs, and libraries that power data workloads across Snorkel’s enterprise platforms. You’ll tackle challenges involving structured and unstructured data, multiple storage types (hot/warm/cold), and various deployment models (public cloud/private cloud/on-prem). If you enjoy architecting performant data infrastructure solutions and working with modern data technologies, this is a unique opportunity to drive impactful innovation in the AI/ML space.
Main Responsibilities
Data Infrastructure Development
- Design, build, and maintain scalable services, APIs, and libraries for managing data workloads on Snorkel’s enterprise platforms.
- Implement secure, access-controlled, and governance-enabled storage solutions that meet diverse needs for structured/unstructured data.
- Integrate and support data ingress/egress with popular data providers (e.g., AWS, Databricks, Snowflake).
Scalability & Performance
- Work closely with cross-functional engineering teams to define workload and performance requirements for different storage tiers (hot, warm, cold).
- Architect flexible solutions that can be deployed in various environments, including public and private clouds, as well as on-premises.
Collaboration & Operations
- Collaborate with enterprise customers to understand their use cases, translate them into engineering specifications, and deliver high-quality solutions.
- Participate in an on-call rotation to troubleshoot and resolve production issues.
- Work a hybrid schedule of three days per week in our Redwood City HQ or SF office.
Minimum Qualifications
- Bachelor’s degree in Computer Science or related field, or equivalent professional experience
- 2+ years of experience in software development, preferably in distributed systems or cloud-native applications
- Strong Python development and debugging skills
- Excellent communication skills and a track record of cross-functional collaboration
Preferred Skills
- Experience with storage infrastructure or storage technologies
- Familiarity with cloud storage solutions (e.g., S3, GCS)
- Familiarity with databases (e.g., Postgres) and ORMs
- Ability to own problems end-to-end and learn new domains or technologies quickly
- Ability to design and integrate storage access controls with IAM systems
- [Nice to have] Experience with Kubernetes
The salary range for this position based in the San Francisco Bay Area is $110,000 - $200,000. All offers include equity compensation in the form of employee stock options.
#LI-HS