Data Engineer

Warszawa
Oferta wygasa za:1 dzień
Tryb rekrutacjiRekrutacja stacjonarna
BranżaAdministracja biurowa

Obowiązki

  • Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest RWD (registries, claims, EHR extracts) and transform into standard models.
  • Implement harmonisation workflows to map incoming RWD to OMOP and to the internal CDISC SDTM canonical model; handle vocabulary mapping, units normalization and provenance.
  • Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning and performance tuning.
  • Develop configurable, input‑driven transformation frameworks so clinical experts can drive mapping rules via config files and catalogs.
  • Integrate AI/automation components (e.g., model‑assisted mapping, NLP for free text) with human‑in‑the‑loop review and confidence scoring.
  • Establish testing, CI/CD, monitoring and alerting for ETL jobs and automations; ensure reproducibility, versioning and governance.
  • Collaborate with clinical data scientists, data stewards and stakeholders to define requirements, data contracts and success metrics.

Wymagania

  • Proven experience designing and implementing ETL pipelines in Databricks/Spark and Delta Lake.
  • Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus.
  • Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets.
  • Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar).
  • Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable.
  • Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow).
  • Good communication skills and experience working with domain experts to capture requirements.
  • Fluent English 

Oferujemy

  • Prior experience in pharma or clinical research environments.
  • Knowledge of data governance, privacy regulations and secure handling of patient data.
  • Experience with Unity Catalog, Databricks Delta Sharing, and cloud infrastructure (Azure/AWS).
Zainteresowała Cię ta oferta?Aplikuj na to stanowisko!

Dodatkowe informacje

Źródło: 7n/Praca

Oferty wybrane dla Ciebie

Oferty wybrane dla Ciebie