Software Engineer, Safeguards Evals

AnthropicGenerative AI, company

San Francisco, United States$320,000 - $485,000 USDSenior

Data & AI

Bookmark Apply on site→

About the role

Builds evaluation infrastructure for AI safety systems, measuring agent performance and driving improvements.

•This role builds the evaluation infrastructure that answers questions about the effectiveness of Anthropic's AI safety systems.
•You'll sit at the intersection of applied ML research and engineering, designing experiments to measure how well an investigative agent performs across harm areas.
•Key Responsibilities Build and own the evaluation harness for an agentic investigation system.
•Construct high-quality eval datasets representing real-world misuse.
•Measure agent performance end-to-end and drive improvements.
•Requirements Proficiency in Python and comfort working across the stack.
•Experience building and maintaining data pipelines.
•Experience working with LLMs and a working understanding of their capabilities and failure modes.

View original posting →

View original posting for full requirements →

Tech stack

PythonLLMsAnthropic API

Match insights

Tech:Python, LLMs, Anthropic API

Level:Senior

More roles at Anthropic

View open roles at Anthropic