Skip to content

A framework for evaluating performance of LLM-based extraction from the electronic health record across different healthcare systems

Published

November 2025

Citation

Seidl-Rathkopf K, Schwarz A, Viani N, et al. A framework for evaluating performance of LLM-based extraction from the electronic health record across different healthcare systems. ESMO AI & Digital Oncology. 2025.

Overview

Large language models (LLMs) are reshaping how real-world data is extracted and curated in oncology, but they also bring new challenges in ensuring data quality. Flatiron Health’s Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework offers a structured approach to evaluate the accuracy of LLM-extracted information. 

This study focuses on the initial application of the VALID framework in the UK and Germany, considering the unique complexities of different healthcare systems. By implementing a GDPR-compliant platform, researchers enabled duplicate abstraction, where two expert reviewers independently extracted clinical data from patient records. This process allowed for the calculation of key performance metrics, such as recall, precision, and F1 score, to benchmark LLM performance against human data extraction.

Why this matters

This research is crucial in building trust in LLM-extracted data, ensuring it meets the specific needs of different countries. As LLMs become more widely used, this work lays the foundation for reliable and accurate data extraction at scale. Future efforts will focus on applying  the remaining two pillars of the VALID framework, automated checks and replication analyses, enhancing the integrity of real-world data across global healthcare systems.

Share