Skip to content

Comparison of population characteristics in real-world clinical oncology databases in the US: Flatiron Health, SEER, and NPCR

Published

July 2023

Citation

Ma X, Long L, Moon S, Adamson BJS, Baxi SS Comparison of population characteristics in real-world clinical oncology databases in the US: Flatiron Health, SEER, and NPCR. medRxiv. 2023.06.07. https://www.medrxiv.org/content/10.1101/2020.03.16.20037143v3

Summary

The oncology field is evolving rapidly with advancements in our understanding of cancer and the development of new treatments. Real-world data (RWD) has emerged as a valuable resource for clinical research, providing insights from epidemiology to intervention effectiveness studies. RWD is obtained from various sources, including billing records, disease registries, surveys, and electronic health records (EHRs). EHRs have become a prominent source of RWD. In the US, the Surveillance, Epidemiology, and End Results Program (SEER) and the National Program of Cancer Registries (NPCR) have been commonly used for oncology research. Flatiron Health generates RWD from its proprietary oncology-specific EHR called OncoEMR, as well as through EHR data integrations with academic research centers.

In the original study, researchers aimed to describe the data sources, collection procedures, and demographic characteristics of disease-specific databases from Flatiron Health, SEER, and NPCR, to gain a better understanding of the strength and limitations of each dataset and provide researchers with important information for interpreting findings in cancer research. 

In the April 2023 version, authors incorporate an update to the birth year variable, reflecting best practices in patient de-identification. These changes resulted in the observation of an overall more similar proportion of patients in the Flatiron Health database diagnosed over age 80, compared to the SEER and NPCR databases.

Why this matters

Understanding the differences in each RWD dataset’s collection methods and resulting patient populations can help researchers determine which dataset is fit-for-use for their specific research question. It can also help them contextualize research results obtained when using each data source. 

Read the research

Share