ML Platform Engineer - Large scale compute

Who are we?

On a mission to make video easy for anyone …

It is an exciting time to join Synthesia as we reached a hallmark by becoming a Unicorn, having raised $90 million in Series C funding and now evaluated at $1 billion!! ✨ 🦄

Synthesia is the world’s #1 AI video generation platform. Well, it’s actually a video production studio — in a browser. As in, no cameras or film crews at all. You simply choose an avatar, enter your script in one of 60 languages, and your video is ready in minutes. In Synthesia, you can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities. 🎬

We believe the future of media is synthetic, and we are on a mission to turn cameras into code and make everyone a creator. To learn more, check out our brand video that explains what we’re doing at Synthesia.

About the role

We are looking for an ML Platform Engineer - Large scale compute to help streamline our ML development process and accelerate our research teams. You will be working as part of the newly created ML Platform Team, supporting 7+ teams that work "full stack". Our Research Engineers run Ops, Engineering and Science. We have built our own workflow engine. Now we want to streamline the process, adopt the right tools and super-charge the work we are doing.

🔬You are someone that focuses on the developer experience, you have a close attention to detail and you create and communicate clear, well-defined processes. You love to support and help others. The happiest day is when you hear "it was just 1-click and everything worked". You love to build systems that unblock others and unlock scale. You are used to working in a fast-paced start-up environment.

👩‍💼 You will join a group of more than 40 Researchers and Engineers in the R&D department. This is an open, collaborative and highly supportive environment. We are all working together to build something big - the future of synthetic media and programmable video through Generative AI. We are proud of the culture, as well as the impact of the technology we are building.

What will you be doing?

🚀 In this position, you will set up and own our compute cluster on AWS. You will be supporting multiple deep tech ML teams on large model training across 100s GPUs We want you to:

Setup high performance compute clusters that are easy to access by researchers, come pre-bundled with all necessary tools & packages and can be monitored easily.
Help our research teams set up their distributed ML infrastructure.
Establish best practices on running ML Models on distributed hardware.
Optimise the solution for ease-of-use, efficiency and maximum utilisation.
Configure & create infrastructure for researchers to run their large models on.
Implement new tooling in the areas of orchestration, experiment tracking, service deployment, Infrastructure as Code.

Who are you?

We are looking for experienced MLOps professionals, someone who thrives working in a busy start-up environment and is prepared to learn something new every day! You will be our "Go-to" person when it comes to training our models and conducting research on multiple GPUs in the Cloud. You will have:

3+ years experience in Cloud Engineering / ML Ops / DevOps.
Experience setting up distributed compute in Cloud environment (AWS or other provider).
Experience with Streaming / Batch Data Pipelines (Airflow, Apache Beam, Spark etc.).
Experience with Event-driven systems.
Experience in running Distributed processing in ML.
Experience in Model deployment and serving (Docker - K8s / Terraform / Kubernetes).
Experience supporting deep tech teams working using Python package distribution.
Outstanding communication skills.

Nice to have…

If you have seen large scale model training with 1000s models built a day through data pipelines with 100s component services. If you have seen multi-GPU large model training using large scale audio-video datasets with 10000s of hours of content. If you have created a streamlined platform to support world class research teams spanning tech planned direct to product, to foundational research for top-tier academic conferences. We would love to talk to you. We'd also love to talk to you - if this what you dream of doing. 😎

Nice to have tools/skills:

Experience with general-purpose Workflow Orchestrators (Airflow, Kubeflow etc.).
Experience with Highly distributed ML training.
Experience dealing with Big Data.

The good stuff...

💸 You will be compensated well (salary + stock options + bonus), the base salary ranges between €80 000 -100 000, depending on experience.

📍 You will work in a hybrid setting with an office in Amsterdam, the Netherlands

🏝 You get 25 days of annual leave + public holidays

🥳 You will join an established company culture with regular socials and company retreats

🤩 You get 4 weeks paid sabbatical after 4 years at the company + $10,000!!

👉 You can participate in a generous referral scheme

🚀 You will have huge opportunities for your career growth

You can see more about Who we are and How we work here: https://www.synthesia.io/careers

Apply for this job

ML Platform Engineer - Large scale compute

Who are we?

About the role

What will you be doing?

Who are you?

Nice to have…

Nice to have tools/skills:

The good stuff...

Other AI Jobs like this

Director of Treasury

Head of Tax Operations & Compliance

Technical Accounting and Financial Reporting Manager

Engineering

Data

Other Roles

Locations