DataOps Engineer

Who are we?

On a mission to make video easy for anyone …

Synthesia is the world’s #1 AI video generation platform. Well, it’s actually a video production studio — in a browser. As in, no cameras or film crews at all. You simply choose an avatar, enter your script in one of 60 languages, and your video is ready in minutes. In Synthesia, you can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities. 🎬

We believe the future of media is synthetic, and we are on a mission to turn cameras into code and make everyone a creator. To learn more, check out our brand video that explains what we’re doing at Synthesia.

 

About the role

We are looking for a DataOps Platform Engineer to help R&D manage large data-sets of audio-video data at Synthesia. We are creating a new ML Platform team, that will be supporting 7+ teams developing cutting edge solutions in generative video synthesis. You will join us to set up a world class data function, managing a lake with PB scale data and building complex audio/visual data pipelines to bring order and make data consumption simple. You are going to super-charge our research.

🔬You are someone that loves DevOps, you love Data, and you want to work at Scale. You pay close attention to detail and you create and communicate clear, well-defined processes. You love to support and help others. The happiest day is when you hear "it was so easy, just 1-click and everything worked". You love to build systems that unblock others and unlock scale.

👩‍💼 You will join a group of more than 35 Researchers and Engineers in the R&D department. This is an open, collaborative and highly supportive environment. We are all working together to build something big - the future of synthetic media and programmable video through Generative AI. We are proud of the culture, as well as the impact of the technology we are building.

 

What will you be doing?

🚀 In this position, you will be working alongside our Senior DataOps Engineer to help streamline our ML development process with access to large scale datasets for our ML teams in R&D at Synthesia. You will help set up our audio-video data pipeline for the Video team and our speech data pipeline for the Voice team. You will be responsible for:

  • 3+ years minimum experience in Data Engineering / Data Ops / Data Science.
  • Experience working in AWS Cloud Environment.
  • Been involved in managing large scale datasets not just one-off data collection tasks, you have seen continuous data collection.
  • Been responsible for setting up data ops (ingest / storage / transform / access) end-to-end for multiple teams.
  • Good Python skills - you can write high standard code and perform tests, and are comfortable with CI/CD and deploying code.
  • Experience with Streaming / Batch Data Pipelines (Airflow, Prefect, Argo, Dagster, etc.).
  • Experience with event-driven systems.
  • Experience in handling heterogeneous types of data (e.g. audio / text / video / tabular data).
  • Experience with any type of RDBMs (Relational Database Management Systems) or Non-relational DBMs.
  • An eye to detail ensuring data governance, traceability and versioning is held to the highest standard.
  • Outstanding communication skills.

Who are you?

We are looking for experienced DataOps professionals, someone who thrives working in a busy start-up environment and is prepared to learn something new every day! You will have:

  • 3+ years minimum experience in Data Engineering / Data Ops / Data Science.
  • Been involved in managing large scale datasets not just one-off data collection tasks, you have seen continuous data collection.
  • Been responsible for setting up data ops (ingest / storage / transform / access) end-to-end for multiple teams.
  • Seen audio/video data and understand managing audio/video data at PB scale.
  • Experience with Streaming / Batch Data Pipelines (Airflow, Apache Beam, Spark etc.).
  • Experience with event-driven systems.
  • Experience in handling heterogeneous types of data (e.g. audio / text / video / tabular data).
  • Experience with any type of RDBMs (Relational Database Management Systems).
  • Outstanding communication skills.

 

Nice to have…

If you have seen large scale data management and data governance, multi-modal data-sets, multi-stage data transform pipelines, and large model training with 10000s to 100000s of hours of content. If you have worked with ML Ops to provide data sources to support world class research teams spanning tech planned direct to product as well as foundational research for top-tier academic conferences, then we would love to talk to you! We'd also love to talk to you - if this what you dream of doing. 😎 

Nice to have tools:

  • AWS Cloud Environment.
  • Experience with general-purpose Workflow Orchestrators (Airflow, Kubeflow etc.).
  • Experience with wide varieties of database types (SQL, columnar, document based).

 

The good stuff...

💸 You will be compensated well (salary + stock options + bonus), the base salary ranges between €80 000 -100 000, depending on experience.

📍 You will work in a hybrid setting with an office in Munich

🏝 You get 25 days of annual leave + public holidays

🥳 You will join an established company culture with regular socials and company retreats

🤩 You get 4 weeks paid sabbatical after 4 years at the company + $10,000!!

👉 You can participate in a generous referral scheme

🚀 You will have huge opportunities for your career growth

You can see more about Who we are and How we work here: https://www.synthesia.io/careers

Apply for this job
logo Synthesia DataOps Full-time 💰 80K - 100K EUR Hybrid 📍 München, Bavaria, Germany Apply Now
Your subscription could not be saved. Please try again.
Your subscription has been successful.

Newsletter

Subscribe and stay updated.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Join our newsletter