About the role
Design and evaluate datasets and environments for advanced AI systems.
- •Define, design, and evaluate datasets, tasks, environments, and benchmarks for advanced AI systems.
- •Key Responsibilities Design and build datasets, tasks, environments, and evaluation assets for benchmarking agentic systems.
- •Develop frameworks that assess diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets.
- •Evaluate planning, tool use, robustness, recovery from failure, task completion, and generalization behavior in RL-style environments.
- •Requirements Strong machine learning background with a focus on reinforcement learning and agentic systems.
- •Experience with designing and evaluating high-quality datasets for AI systems.
- •Proficiency in Python and familiarity with TensorFlow or PyTorch.
Tech stack
PythonTensorFlowPyTorch
Match insights
Tech:Python, TensorFlow, PyTorch
Level:Mid