Generalizable machine learning framework for predictive modeling of patient outcomes using oncology electronic health records

https://doi.org/10.1016/j.jval.2020.04.1752

Authors:
Stasiw, A, Falk, S, Garapati, S, Sridharma, S, Mendelsohn, D, Lakhtakia, S, Rech, A, Oldridge, D, Adamson, BJS, Chen, R

Objectives

In oncology, accurate and reliable prognostic assessments enhance clinical decision-making, both in practice and research contexts. We describe a generalizable, broadly-applicable machine learning framework for predicting patient outcomes using oncology electronic health record (EHR) data.

Methods

The framework consists of: (i) Research question identification, including a patient-level variable (“label”) to predict. (ii) Index date and observation window definition, extracting patient features for a relevant cohort. (iii) Model training using standard cross-validation techniques to optimize predictive ability. (iv) Model evaluation on an unseen cohort subset to predict the outcome label, evaluate model effectiveness, and rank features on their predictive importance. We applied this approach to a cohort of multiple myeloma patients in a predictive model of 5-year mortality post-autologous-transplant.

Results

The framework provided two outputs: 1) four standard machine learning models (logistic regression, random forest, support vector machine, gradient boosted trees) with performance metrics (AUC, precision, recall, accuracy); 2) a ranked list of outcome-predictive patient features according to the models. In the multiple myeloma analysis (n=1099) the label was 5-year-survival after transplant date, and EHR-defined features included demographics, medication administrations, lines of therapy, lab results, and cytogenetic or biomarker testing status. Random forest and gradient boosted trees achieved AUC of 0.79 and accuracy of 0.76; the most predictive (highest absolute weight) features identified were M-spike, chromosome 1 abnormalities, and diagnosis age.

Conclusions

We have developed a generalizable machine learning framework, agnostic to specific cancer diagnosis, to improve the prediction of specific outcomes and to identify potentially-predictive features using oncology EHR-derived data. Customized models based on this framework could be applied to adverse event prediction, early detection of disease progression, and hospital readmission risk with relatively minimal labor duplication, streamlining HEOR opportunities.

Sources:
ISPOR Annual Meeting

Publications

Generalizable machine learning framework for predictive modeling of patient outcomes using oncology electronic health records

Objectives

Methods

Results

Conclusions

Share

Posted in

More publications

Blood Cancer Journal

December 2025

Utilization of real-world evidence in regulatory approvals for multiple myeloma therapies

Taylor L, Chen A, Pierre A.

ESMO AI & Digital Oncology

November 2025

Survival prediction in advanced NSCLC (aNSCLC) amid evolving standards of care (SOC): Digital twin modeling incorporating LLM-extracted clinical context

Estevez M, Griffith S, Williams T, et al.

ESMO AI & Digital Oncology

November 2025

A pan-tumor and pan-country approach to LLM-based extraction of systemic therapies from the electronic health record

Viani N, Groizard L, Harrison K, et al.