Senior Research Engineer, Language Model Evaluations

Hippocratic AI’s mission is to develop the first safest focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health. The company was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, and Nvidia. Hippocratic AI has received a total of $120M in funding and is backed by leading investors, including General Catalyst, Andreessen Horowitz, Premji Invest, and SV Angel.

About the role

We are looking for a Research Engineer to lead evaluations for Hippocratic AI’s 1 trillion+ parameters constellation of Large Language Models. Your job will be to design and implement evaluations that allow Hippocratic AI to evaluate the performance and safety of our models. As a Research Engineer focused on Evaluation, you'll work closely with our research and applied science teams to design experiments and build evaluation infrastructure. You'll help validate performance and safety across a wide range of important tasks. You’ll help to assure that our LLMs are well-benchmarked with known performance and safety on a wide range of healthcare related tasks, allowing us to compare against human feedback.

Requirements:

5+ years Python programming experience / machine learning research
Have experience using Large Language Models, preferably have trained or fine tuned large models in the past.
Are comfortable writing code
Want to learn more about machine learning research
Care about patient safety
You want to design and implement rigorous evaluations

Preferred:

Building user interfaces for data analysis
Developing robust evaluation metrics for language models
Handling textual dataset sourcing, curation, and processing tasks at scale
Statistics

Representative projects:

Designing and running a new evaluation that tests our model’s reasoning capabilities
Leading the vision of what it takes to safely evaluate patient safety in the world of Generative AI
Devise a consistent but representative evaluation suite for healthcare conversations
Running experiments to determine how prompting techniques affect results on industry benchmarks
Improving the tooling that researchers use to implement evaluations
Explaining our evaluations and their results to internal decision makers and Stakeholders
Collaborating with a research team to develop a robust evaluation for a new model capability they are developing

Apply for this job

Senior Research Engineer, Language Model Evaluations

Requirements:

Preferred:

Representative projects:

Other AI Jobs like this

Engineering

Data

Other Roles

Locations