Skip to content

Privacy-preserving error analysis loop For ML-based extraction of oncology EHR data

Published

November 2025

Citation

Groizard L, Dolezalova N, Kushnir M, et al. Privacy-preserving error analysis loop For ML-based extraction of oncology EHR data. ESMO AI & Digital Oncology. 2025.

Overview

Accurate extraction of clinical data from electronic health records (EHRs) is crucial for advancing oncology research. However, machine learning (ML) methods can sometimes produce errors, impacting data reliability. 

This study introduces a privacy-compliant tool and workflow that allows clinical experts and data scientists to collaboratively identify ML extraction errors against a human expert-curated gold standard. Using a Snowflake-based interactive dashboard, the team reviewed model outputs and human benchmarks to categorize errors and inform model improvements. By enabling this collaborative error analysis process, the study improves model performance by 10%-15% and accelerates feedback cycles from weeks to days. This ensures that ML data extraction is both precise and compliant with European data protection standards. 

Why this matters

This innovative approach enhances the accuracy and reliability of ML-extracted data from EHRs, a critical step for real-world oncology research. The integration of human oversight with ML models not only supports high-quality data curation but also paves the way for regulatory acceptance and multinational research. This advancement is essential for generating trustworthy data that can drive better clinical decisions and ultimately improve patient outcomes in oncology.

Share