Skip to content
Cartesia logo

Software Engineer, Data Infrastructure

CartesiaVoice AI company
San Francisco, United StatesSenior
Data & AI

About the role

Build and manage data infrastructure for AI model training at Cartesia.

  • Data is the lifeblood of our models, and we're looking for a Software Engineer to help build the training data and ML data infrastructure at Cartesia.
  • This role sits at the intersection of data systems, model training, and inference.
  • You'll design and ship the pipelines, datasets, and infrastructure that feed our pre-training and post-training, with particular depth in audio and other multimodal data.
  • Key Responsibilities Contribute to Cartesia's multi-modal data strategy across pre-training and post-training, spanning human, synthetic, and web-scale sources, with particular depth in audio.
  • Design and build scalable, high-throughput data pipelines for text, audio, and video covering ingestion, preprocessing, augmentation, dataset versioning, and data loading for training.
  • Partner closely with research and inference teams so data systems are co-designed with training and serving infrastructure (batching, GPU-aware loading, evaluation pipelines).
  • Requirements Hands-on experience with ML data infrastructure: training data pipelines, dataset versioning, large-scale data loading, and the interplay between data systems and model training and inference.
  • Working knowledge of multimodal data, i.e. audio: formats, preprocessing, augmentation, and large-scale storage and streaming patterns.
  • Strong modern engineering execution: clean, well-tested code, fluency with current tools, and a willingness to pick the right tool for the problem rather than defaulting to familiar patterns.
View original posting →

Tech stack

PythonSQLAirflowApache KafkaAWSDockerKubernetes

Match insights

Tech:Python, SQL, Airflow, Apache Kafka, AWS
Level:Senior

More roles at Cartesia

View open roles at Cartesia