A framework for evaluating performance of LLM-based extraction from the electronic health record across different healthcare systems

Published

November 2025

Citation

Seidl-Rathkopf K, Schwarz A, Viani N, et al. A framework for evaluating performance of LLM-based extraction from the electronic health record across different healthcare systems. ESMO AI & Digital Oncology. 2025.

Overview

Large language models (LLMs) are reshaping how real-world data is extracted and curated in oncology, but they also bring new challenges in ensuring data quality. Flatiron Health’s Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework offers a structured approach to evaluate the accuracy of LLM-extracted information.

This study focuses on the initial application of the VALID framework in the UK and Germany, considering the unique complexities of different healthcare systems. By implementing a GDPR-compliant platform, researchers enabled duplicate abstraction, where two expert reviewers independently extracted clinical data from patient records. This process allowed for the calculation of key performance metrics, such as recall, precision, and F1 score, to benchmark LLM performance against human data extraction.

Why this matters

This research is crucial in building trust in LLM-extracted data, ensuring it meets the specific needs of different countries. As LLMs become more widely used, this work lays the foundation for reliable and accurate data extraction at scale. Future efforts will focus on applying the remaining two pillars of the VALID framework, automated checks and replication analyses, enhancing the integrity of real-world data across global healthcare systems.

Publications

A framework for evaluating performance of LLM-based extraction from the electronic health record across different healthcare systems

Overview

Why this matters

Share

Posted in

More publications

ESMO AI & Digital Oncology

November 2025

A pan-tumor and pan-country approach to LLM-based extraction of systemic therapies from the electronic health record

Viani N, Groizard L, Harrison K, et al.

ESMO AI & Digital Oncology

November 2025

Privacy-preserving error analysis loop For ML-based extraction of oncology EHR data

Groizard L, Dolezalova N, Kushnir M, et al.

arXiv

June 2025

Ensuring reliability of curated EHR-derived data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework

Estevez M, Singh N, Dyson L, et al.