Skip to content
Protege logo

Senior Software Engineer, Data Processing

ProtegeAI Training company
RemoteSenior
Data & AI

About the role

Process large-scale source data into clean, structured, enriched, validated, AI-ready datasets.

  • Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion.
  • This role involves processing large-scale source data into clean, structured, enriched, validated, AI-ready datasets.
  • Key Responsibilities Design, build, and operate the ingestion systems that process large volumes of multimodal data.
  • Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream.
  • Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing.
  • Build parsers, validators, and normalization logic that can systematically handle messy, non-standard, and high-variance source formats.
  • Turn repeated one-off data handling work into reusable processing patterns, internal tooling, and platform capabilities.
  • Requirements Design, build, and operate the ingestion systems that process large volumes of multimodal data into usable, well-structured datasets.
  • Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream.
  • Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing.
View original posting →

Tech stack

PythonPandasNumPySparkAirflowdbtSnowflakeDatabricksApache KafkaApache Flink

Match insights

Tech:Python, Pandas, NumPy, Spark, Airflow
Level:Senior

More roles at Protege

View open roles at Protege