Overview
Real-world evidence is increasingly used to complement clinical trial data in oncology, providing rapid insights to inform study design and drug development. This study demonstrates how advanced machine learning and natural language processing models can be used to extract meaningful clinical outcomes from electronic health records (EHRs) for patients with stage IV non–small cell lung cancer (NSCLC).
By comparing real-world patient data from the Flatiron Health Research Database to patients in the control arm of the IMpower132 clinical trial, researchers evaluated whether response-based outcomes—such as response rates, duration of response, and progression-free survival—could be reliably generated from routine clinical documentation using ML models. The real-world cohort was carefully matched to the clinical trial group using key eligibility criteria and statistical adjustments to ensure comparability. The results showed that the real-world response rates and other key outcomes closely mirrored those observed in the clinical trial, supporting the accuracy and reliability of the ML-driven approach for generating real-world evidence in oncology.
Why this matters
Clinical trials are the gold standard for evaluating new cancer treatments, but they often include only a small, select group of patients. Real-world evidence, drawn from the experiences of patients treated in everyday clinical practice, can provide a more complete picture of how treatments perform across broader, more diverse populations. This study paves the way for more robust and scalable use of real-world data to inform drug development, regulatory decisions, and clinical care—ultimately helping to ensure that advances in cancer treatment benefit all patients, not just those enrolled in trials.