Overview
Machine learning (ML) is a promising tool for curating real-world data from electronic health records (EHRs) and has the potential to accurately and efficiently extract nuanced clinical details from the chart. This study evaluated the reliability, completeness, and internal validity of a novel ML-generated real-world response (rwR) approach using Flatiron Health’s EHR-derived US database.
A deep learning-based natural language processing model was trained to extract clinicians’ documentation of disease burden changes (e.g., complete response, partial response, stable disease, progressive disease) at imaging timepoints. Researchers tested the approach on a cohort of over 4,000 patients spanning 15 common solid tumors and found a strong correlation between human-abstracted and ML-extracted rwR data, demonstrating the model’s accuracy. Additionally, the longer survival observed for rwR defined “responders” compared to “non-responders” is clinically expected and supports the validity of the variable.
Why this matters
This study validates the use of ML to extract real-world response from unstructured data in the EHR, offering a scalable and efficient method for generating high-quality oncology real-world data. By improving the accuracy and completeness of response assessments, this approach enhances the ability to evaluate treatment effectiveness in real-world settings. These findings support the broader adoption of ML-driven methodologies in oncology research, ultimately advancing real-world evidence generation.