Skip to content

Generalizable machine learning framework for predictive modeling of patient outcomes using oncology electronic health records

Published

May 2020

Citation

Stasiw, A, Falk, S, Garapati, S, Sridharma, S, Mendelsohn, D, Lakhtakia, S, Rech, A, Oldridge, D, Adamson, BJS, Chen, R. . ISPOR Annual Meeting. .

https://doi.org/10.1016/j.jval.2020.04.1752

Authors:
Stasiw, A, Falk, S, Garapati, S, Sridharma, S, Mendelsohn, D, Lakhtakia, S, Rech, A, Oldridge, D, Adamson, BJS, Chen, R

Objectives

In oncology, accurate and reliable prognostic assessments enhance clinical decision-making, both in practice and research contexts. We describe a generalizable, broadly-applicable machine learning framework for predicting patient outcomes using oncology electronic health record (EHR) data.

Methods

The framework consists of: (i) Research question identification, including a patient-level variable (“label”) to predict. (ii) Index date and observation window definition, extracting patient features for a relevant cohort. (iii) Model training using standard cross-validation techniques to optimize predictive ability. (iv) Model evaluation on an unseen cohort subset to predict the outcome label, evaluate model effectiveness, and rank features on their predictive importance. We applied this approach to a cohort of multiple myeloma patients in a predictive model of 5-year mortality post-autologous-transplant.

Results

The framework provided two outputs: 1) four standard machine learning models (logistic regression, random forest, support vector machine, gradient boosted trees) with performance metrics (AUC, precision, recall, accuracy); 2) a ranked list of outcome-predictive patient features according to the models. In the multiple myeloma analysis (n=1099) the label was 5-year-survival after transplant date, and EHR-defined features included demographics, medication administrations, lines of therapy, lab results, and cytogenetic or biomarker testing status. Random forest and gradient boosted trees achieved AUC of 0.79 and accuracy of 0.76; the most predictive (highest absolute weight) features identified were M-spike, chromosome 1 abnormalities, and diagnosis age.

Conclusions

We have developed a generalizable machine learning framework, agnostic to specific cancer diagnosis, to improve the prediction of specific outcomes and to identify potentially-predictive features using oncology EHR-derived data. Customized models based on this framework could be applied to adverse event prediction, early detection of disease progression, and hospital readmission risk with relatively minimal labor duplication, streamlining HEOR opportunities.
 

Sources:
ISPOR Annual Meeting

Share