Skip to content

Assessing bias in LLM-extracted real-world data: A health equity analysis of access to care and outcomes in metastatic breast cancer

Published

October 2025

Citation

Mbah O, Ho G, Keane C, Yuan Q, Ryals C. Assessing bias in LLM-extracted real-world data: A health equity analysis of access to care and outcomes in metastatic breast cancer. ISPOR Europe. 2025.

Overview

In the evolving landscape of healthcare technology, the use of Large Language Models (LLMs) to extract clinical data from electronic health records is gaining momentum. It is critical to understand if these models introduce bias or mask existing inequities. This study assesses the fairness of LLMs by comparing their data extraction capabilities to human abstraction in evaluating racial and ethnic inequities in biomarker testing and overall survival among patients with HR+/HER2- metastatic breast cancer. 

Using the US-based Flatiron Health Research Database, researchers analyzed data from over 25,000 patients to determine if LLMs can reliably replicate human-abstracted data in identifying inequities linked to race, ethnicity, and social determinants of health.

Why this matters

This research is pivotal as it demonstrates that LLMs can provide a fair and scalable alternative to manual data abstraction in health equity studies. The findings reveal that both LLM-extracted and human-abstracted data show similar patterns of inequity, with some historically minoritized groups and patients living in areas with high social deprivation facing lower rates of biomarker testing and worse survival outcomes. By validating this approach, the study paves the way for more efficient and equitable data-driven insights in oncology.

Share