Summary
Real-world data (RWD) collected outside clinical trials are increasingly used for research and clinical drug development. Electronic Health Records (EHRs) are especially useful for this research as they provide detailed clinical information about a patient’s disease journey. However, lab values and other variables in EHR-derived data are often subject to missingness, creating challenges for statistical analyses. Naive approaches to analyzing data with missing values may result in biased results. Imputation methods, where missing values are predicted using additional variables collected in the dataset, may help to reduce uncertainty and provide unbiased results. However, the appropriate analysis strategy depends on the type of missingness present in the data.
In this paper, researchers from Roche/Genentech and Flatiron Health outline a systematic workflow for characterizing missing data mechanisms and performing subsequent statistical analyses to address this issue.
Why this matters
This approach offers a way to address missing data in EHRs, which are a valuable source of data for healthcare research. By characterizing the missing data mechanisms, researchers can appropriately quantify uncertainty in their analyses. This improves the validity of real-world evidence research, in line with guidance from regulatory agencies and ultimately leading to better outcomes for patients.