DataOps Engineer

Who are we?

On a mission to make video easy for anyone …

Synthesia is the world’s #1 AI video generation platform. Well, it’s actually a video production studio — in a browser. As in, no cameras or film crews at all. You simply choose an avatar, enter your script in one of 60 languages, and your video is ready in minutes. In Synthesia, you can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities. 🎬

We believe the future of media is synthetic, and we are on a mission to turn cameras into code and make everyone a creator. To learn more, check out our brand video that explains what we’re doing at Synthesia.

About the role

We are looking for a DataOps Platform Engineer to help R&D manage large data-sets of audio-video data at Synthesia. We are creating a new ML Platform team, that will be supporting 7+ teams developing cutting edge solutions in generative video synthesis. You will join us to set up a world class data function, managing a lake with PB scale data and building complex audio/visual data pipelines to bring order and make data consumption simple. You are going to super-charge our research.

🔬You are someone that loves DevOps, you love Data, and you want to work at Scale. You pay close attention to detail and you create and communicate clear, well-defined processes. You love to support and help others. The happiest day is when you hear "it was so easy, just 1-click and everything worked". You love to build systems that unblock others and unlock scale.

👩‍💼 You will join a group of more than 35 Researchers and Engineers in the R&D department. This is an open, collaborative and highly supportive environment. We are all working together to build something big - the future of synthetic media and programmable video through Generative AI. We are proud of the culture, as well as the impact of the technology we are building.

What will you be doing?

🚀 In this position, you will be working alongside our Senior DataOps Engineer to help streamline our ML development process with access to large scale datasets for our ML teams in R&D at Synthesia. You will help set up our audio-video data pipeline for the Video team and our speech data pipeline for the Voice team. You will be responsible for:

Data Ops for data management, versioning, usage tracking, logging.
Setup of a data-lake and data transform pipelines for large scale audio-visual datasets.
Integration of 3d party annotation services for continuous data annotation and active learning.
Setup of metadata stores and APIs to access data-sets on demand for ML training.
Support for data streaming to train large models.
Data pipelines - deploy custom ML data transformations, working with our ML team.
Data access - create transient data-sets on demand to support ML model training.
Data tracking - usage tracking and monitoring across all data sources.
Establish the workflow for continual data delivery and annotation.

Who are you?

We are looking for experienced DataOps professionals, someone who thrives working in a busy start-up environment and is prepared to learn something new every day! You will have:

3+ years minimum experience in Data Engineering / Data Ops / Data Science.
Experience working in AWS Cloud Environment.
Been involved in managing large scale datasets not just one-off data collection tasks, you have seen continuous data collection.
Been responsible for setting up data ops (ingest / storage / transform / access) end-to-end for multiple teams.
Good Python skills - you can write high standard code and perform tests, and are comfortable with CI/CD and deploying code.
Experience with Streaming / Batch Data Pipelines (Airflow, Prefect, Argo, Dagster, etc.).
Experience with event-driven systems.
Experience in handling heterogeneous types of data (e.g. audio / text / video / tabular data).
Experience with any type of RDBMs (Relational Database Management Systems) or Non-relational DBMs.
An eye to detail ensuring data governance, traceability and versioning is held to the highest standard.
Outstanding communication skills.

Nice to have…

If you have seen large scale data management and data governance, multi-modal data-sets, multi-stage data transform pipelines, and large model training with 10000s to 100000s of hours of content. If you have worked with ML Ops to provide data sources to support world class research teams spanning tech planned direct to product as well as foundational research for top-tier academic conferences, then we would love to talk to you! We'd also love to talk to you - if this what you dream of doing. 😎

Nice to have tools:

Understanding of MLOps and working within the ML space.
Experience with wide varieties of database types (SQL, columnar, document based).
Good understanding of Video/Audio data.

The good stuff...

💸 You will be compensated well (salary + stock options + bonus), the base salary ranges between £80 000 -100 000, depending on experience.

📍 You will work in a hybrid setting with an office in London

🏝 You get 25 days of annual leave + public holidays

🥳 You will join an established company culture with regular socials and company retreats

🤩 You get 4 weeks paid sabbatical after 4 years at the company + $10,000!!

👉 You can participate in a generous referral scheme

🚀 You will have huge opportunities for your career growth

You can see more about Who we are and How we work here: https://www.synthesia.io/careers

Apply for this job

DataOps Engineer

Who are we?

About the role

What will you be doing?

Who are you?

Nice to have…

Nice to have tools:

The good stuff...

Other AI Jobs like this

Director of Treasury

AI Performance Optimization Engineer

Executive Assistant to the Tech Leadership - Paris

Engineering

Data

Other Roles

Locations