Skip to content

Using large language models for scalable extraction of real-world progression events across multiple cancer types

Published

July 2025

Citation

Cohen A, Krismer K, Magee K, et al. Using large language models for scalable extraction of real-world progression events across multiple cancer types. AACR Special Conference in Cancer Research: Artificial Intelligence and Machine Learning. 2025.

Overview

Accurately identifying when a patient’s cancer has progressed is crucial for measuring important outcomes like real-world progression-free survival (rwPFS), evaluating the efficacy of cancer-directed therapy, and advancing oncology research. However, this information is often inconsistently documented within unstructured clinical notes buried deep in the electronic health record (EHR). Traditionally, expert human abstractors review these records to extract progression events, but this process is slow, costly, and difficult to scale. This is especially true given progression is also defined and documented differently across cancer types. 

This study explored whether large language models (LLMs) could reliably extract real-world progression events and dates from unstructured EHR text for 14 major cancers. Researchers compared the performance of LLMs to expert human abstractors, measuring agreement on both the presence and timing of progression events, as well as assessing how these differences impacted key research outcomes like rwPFS. The study found that LLMs performance approached expert human performance and estimates of rwPFS were nearly identical between LLM- and human-curated data.

Why this matters

This research demonstrates the potential of LLMs to extract critical and nuanced clinical endpoints across multiple cancer types while maintaining data quality. By leveraging LLMs to scale data curation and assist in generating large, reliable datasets, these tools have the potential to accelerate cancer research, improve predictive tools, and support more personalized treatment decisions for patients.

Share