Engineering Manager, Evals

AnysphereAI Coding company

San FranciscoManager

Software Engineering

About the role

Lead the evaluation team to create high-signal datasets and tools for coding agents.

•Lead the group responsible for creating high-signal evaluation datasets for coding agents and building the tools engineers use to write and run them.
•Key Responsibilities Set the eval roadmap end-to-end—what we measure, why it matters, and how signals turn into shipping + training decisions.
•Lead and grow a high-impact team of engineers and researchers building eval datasets and developer-friendly tools to write and run evals.
•Guide the next generation of CursorBench so it continues to reflect real developer workflows at Cursor, and expand it with new evals that measure other properties developers value.
•Define crisp online quality signals and turn regressions into robust guardrails.
•Integrate evals into decision-making cadence for launches, deploys, and model training loops.
•Requirements You’ve led engineering teams shipping production systems and have strong people leadership and coaching skills.
•You can align research, product, data, and infrastructure on what “good” means—and turn that into durable metrics, processes, and release/training rituals.
•You have good taste and strong opinions on model and agent behaviors, and you stay up-to-date on emerging research and industry trends.
•You have strong data acumen, and can collaborate effectively with data scientists and researchers.

Level:Manager