About the role
Process large-scale source data into clean, structured, enriched, validated, AI-ready datasets.
- •Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion.
- •This role involves processing large-scale source data into clean, structured, enriched, validated, AI-ready datasets.
- •Key Responsibilities Design, build, and operate the ingestion systems that process large volumes of multimodal data.
- •Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream.
- •Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing.
- •Build parsers, validators, and normalization logic that can systematically handle messy, non-standard, and high-variance source formats.
- •Turn repeated one-off data handling work into reusable processing patterns, internal tooling, and platform capabilities.
- •Requirements Design, build, and operate the ingestion systems that process large volumes of multimodal data into usable, well-structured datasets.
- •Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream.
- •Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing.
Tech stack
PythonPandasNumPySparkAirflowdbtSnowflakeDatabricksApache KafkaApache Flink
Match insights
Tech:Python, Pandas, NumPy, Spark, Airflow
Level:Senior