Skip to content

A pan-tumor and pan-country approach to LLM-based extraction of systemic therapies from the electronic health record

Published

November 2025

Citation

Viani N, Groizard L, Harrison K, et al. A pan-tumor and pan-country approach to LLM-based extraction of systemic therapies from the electronic health record. ESMO AI & Digital Oncology. 2025.

Overview

Capturing comprehensive data on systemic therapies is crucial for enhancing real-world oncology datasets. However, systemic therapy data, especially for orally administered drugs, is often missing from structured electronic health records (EHRs), making manual abstraction labor-intensive.

This study explores the use of large language models (LLMs) to extract oral therapy details from unstructured clinical documents in both the US and UK. Researchers found LLMs showed high performance for oral therapy extraction in both the US and UK cohorts, demonstrating LLM-based extraction of systemic oral therapies from unstructured EHRs is feasible, accurate, and transferable across oncology settings.

Why this matters

The ability to accurately and efficiently extract oral therapy information using LLMs reduces the need for manual data entry but also enhances the scalability of data extraction across different healthcare systems. The study's findings demonstrate that LLMs can be effectively adapted to account for regional differences in treatment and documentation, providing a robust tool for improving data quality. As the research expands to include non-English speaking markets, this method holds the potential to extend data extraction globally, supporting better clinical decisions and personalized patient care.

Share