About the role
Own the quality and coverage of the data behind our models.
- •To build truly global AI, our models must be trained on data that reflects the world's diversity of languages and cultures.
- •We are searching for a Research Engineer to own the quality and coverage of the data behind our models.
- •Key Responsibilities Design and build large-scale datasets for model training.
- •Build evaluations of speech models.
- •Implement techniques for steering data generation to improve model intelligence.
- •Build automated quality control systems to validate and filter generated data.
- •Partner with product teams to ensure support for key languages and markets.
- •Requirements Experience building or working with large multilingual datasets.
- •Experience with generative models (speech, text, or multimodal).
- •Ability to help guide human annotation and evaluation across multiple languages.
Tech stack
PythonNumPyPandasTensorFlowscikit-learnNLPLLMsHugging FaceAirflowDatabricksAWSPostgreSQLGitLinux
Match insights
Tech:Python, NumPy, Pandas, TensorFlow, scikit-learn
Level:Mid