Senior Software Engineer, Site Reliability

Who we are

At Gretel, we're building the platform that developers and data scientists trust for safe, AI-ready data. Our platform combines differential privacy with state-of-the-art AI to generate synthetic data across multiple modalities, whether starting from your own unique data or creating entirely new datasets from prompts. We enable organizations to unlock sensitive data for AI development and analytics. We believe you shouldn't have to search for the right dataset, you should be able to generate exactly what you need while maintaining the highest standards of privacy and utility.

We’re a highly collaborative remote-first company with employees across the U.S. and Canada. Our innovative and transparent culture offers employees the autonomy, tools, and trust to act like owners. We’re disrupting how organizations innovate with data and are looking for a talented SRE to join our mission.

The impact you’ll have

As a Senior or Staff Site Reliability Engineer (SRE) at Gretel you will ensure the safety, security, and reliability of our cloud infrastructure. This includes our compute infrastructure, container orchestration platform, deployment pipelines, and observability stack.

What you will do

Build and maintain Gretel's observability stack. Measure and monitor Gretel's availability, latency, and overall system health
Scale systems sustainably with automation and continuously improve and evolve systems
Manage and lead incident response, recovery, and blameless postmortems
Partner with software engineers to troubleshoot production issues
Build tools and frameworks that help Gretel engineers be more productive
Ship complex ML/AI models in partnership with Gretel's applied science and engineering teams

Minimum Qualifications

Experience with at least one cloud platform (we use AWS heavily)
Experience with Docker and Kubernetes
Ability to write software and tools in Python or Go
Experience with monitoring, alerting and operations
Experience operating highly available distributed systems in the cloud
Experience identifying, diagnosing, and responding to operational outages

Preferred Qualifications

Experience with infrastructure as code (Terraform, CloudFormation, etc)
Experience with build systems such as Bazel
Experiencing shipping application with complex dependencies (Pytorch, Tensorflow)
Software engineering skills beyond script writing (TDD, design patterns, etc)
Experience with DevOps or CI/CD pipelines

We think the best ideas come from the blending of diverse perspectives and experiences, which will lead to a stronger company and advancements in technologies. We hire individuals whose peers call them subject matter experts, whose curiosity draws them to new edges of their field and who like to laugh. We are deeply collaborative, apolitical and mission-oriented.

Gretel is an equal opportunity employer. Individuals seeking employment and employees at Gretel are considered without regard to race, color, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, disability, military or veteran status, or any other characteristic protected by applicable law.

Accommodations: We celebrate diversity and are committed to creating an inclusive environment for all candidates and employees. If you need assistance or an accommodation due to a disability, please let your recruiter know.

Compensation

Employee compensation will be determined based on interview performance, level of experience, specialization of skills, and market rate. During the offer discussion, your recruiter will review the finalized base salary, bonus (for applicable roles), benefits and perks (additional information available on our career site), and stock options as they’ll be reflected in the offer letter.

Employees hired in the U.S. and Canada can expect the below information to reflect a reasonable estimate of the salary offered for this role. Salary ranges are updated regularly using premium market data. (Please note: it is unusual for new hires to receive a base salary at the top of the range. Additionally, the value of Gretel.ai’s stock options is not included in the salary bands and may represent a significant portion of your compensation.)

Senior Site Reliability Engineer: $180,000-$210,000 USD

Staff Site Reliability Engineer: $200,000-$230,000 USD

Apply for this job

Senior Software Engineer, Site Reliability

Who we are

The impact you’ll have

What you will do

Minimum Qualifications

Preferred Qualifications

Compensation

Other AI Jobs like this

Site Reliability Engineer - Platform

Staff Site Reliability Engineer

Site Reliability Engineer

Engineering

Data

Other Roles

Locations