Skip to content

With real-world evidence, the whole is greater than the sum of its parts

Published

April 2025

By

James Roose, Senior Research Scientist at Flatiron Health

With real-world evidence, the whole is greater than the sum of its parts

We recently celebrated a milestone at Flatiron: more than 1,500 publications across nearly 150 peer-reviewed journals have now used our real-world evidence (RWE) to support their research. Over the years, we’ve witnessed how real-world data (RWD) collected during routine care has transformed healthcare by fostering data-driven decision-making and streamlining drug development — ultimately resulting in better outcomes for millions of cancer patients. 

Has RWE reached its full potential? Not by any stretch. A new class of evidence solutions are linking and combining datasets that used to stand alone, and opening the door to exciting new research questions at every stage of the drug development lifecycle.

But first, let’s review the state of play.

RWE at a crossroads

RWD today comes in many different shapes and forms: electronic health records (EHRs), of course, but also insurance claims, disease registries, wearable devices, patient surveys, mobile health apps, medical imaging, molecular testing — even social media. The more data the better, in theory, but in practical terms, working with multiple datasets can be a daunting challenge and low-quality or missingness of critical data elements can decrease utility.

Some datasets have incomplete, inaccurate, or unvalidated data, and few are compatible with one another. Much clinically-meaningful information remains trapped in unstructured data. Researchers today still spend too much  time addressing data-related issues, rather than actually testing hypotheses and developing actionable insights.

Challenges with data quality or linking between different datasets are not exclusive to the biomedical field, of course, but that doesn’t make it any less painful. “When data is collected from disparate data sources without centralized oversight and rigorous policies and procedures around data curation, it can quickly become a liability,” noted Flatiron VP of Research Oncology Neal Meropol, MD in a recent webinar hosted by GenomeWeb. And it makes it nearly impossible for researchers to capitalize on potential synergies between those datasets.

Within the healthcare ecosystem, we recognize that gaps naturally exist across diverse datasets and are working diligently to bridge them—leveraging expansive molecular and clinical data to unlock new possibilities and create solutions where all data has the power to contribute. It goes without saying that molecular data and clinical data are both extremely powerful on their own. However, when these datasets are of sufficient quality and depth, and properly combined and leveraged in combination, a whole new frontier of research can be unlocked.

Clinical-molecular databases are paving the way

At Flatiron, we’ve been on the forefront of real-world data, with the  development of two breakthrough linked databases in recent years:

  • The Clinical-Molecular Database (CMDB), a dataset of 78,000+ de-identified patient records that combines broad and detailed longitudinal clinical data from Flatiron Health with whole exome sequencing, whole transcriptome sequencing, immunohistochemistry (IHC), and digital pathology from Caris Life Sciences.
  • The Clinico-Genomic Database (CGDB), a joint dataset of 150,000+ de-identified patient records that combines broad and detailed longitudinal clinical data from Flatiron Health with next-generation sequencing (NGS) genomic profiling data from Foundation Medicine (FMI) covering more than 300 cancer-relevant genes and complex signatures.

These multimodal datasets can help researchers understand the relationship between genetic alterations (or changes in expression at the RNA or protein level) and important patient-centric clinical outcomes (like real-world progression-free survival).

As my colleague Kristi Savill, PhD, Director of Scientific Engagement for Precision Oncology, explained recently, those datasets can be used to fulfill a wide variety of research objectives, such as “identifying novel targets; profiling target prevalence across indications; uncovering associated clinical and biological features; refining product profiles; uncovering novel mechanisms of response or resistance to a therapy; exploring the biology associated with certain adverse events; or identifying potential new predictive or prognostic biomarkers.”

The importance of broad, representative real-world data

When a biomarker, such as a genetic mutation within a patient’s tumor, is central to one’s evidence strategy, linked datasets alone may not be the only approach to address all of the research questions that can arise. NGS testing and molecular profiling aren’t universally standard of care in all cancer types and testing rates vary based on factors such as demographics, tumor type, testing stage, and  disease characteristics/subtypes. If patients do not receive molecular profiling, then they would not show up in linked clinical-molecular databases. This is critical, since subpopulations of patients will be underrepresented in certain precision oncology-related real-world evidence research. Among breast cancer patients, for instance, triple-negative breast cancer (TNBC) patients are more likely to go through comprehensive genomic profiling than HR+/HER2- patients, in both the early and metastatic disease settings. Oncologists tend to order more testing than pathologists. And testing rates are typically higher in academic centers than at community clinics.

Multi-modal linked databases like the CGDB and CMDB are highly valuable in providing a deep understanding of the link between clinical characteristics or outcomes and tumor biology, but clinical data from a broader population of patients receiving care in the  real-world setting can also be useful to extrapolate key findings to the general patient population. That’s where data like Flatiron Health’s Panoramic data comes in:

  • Flatiron Health Panoramic data contains longitudinal, EHR-derived clinical data drawn from over 100 sites (most of them in the community setting) and covers Flatiron’s entire nationwide network of over 5 million de-identified patient records. The data is representative of the patient population for over twenty-two tumor types, including many underserved cohorts, and can be used to answer questions about patient characteristics, treatment patterns, patient outcomes, testing rates, and trends over time.

Consider a study where researchers used the CMDB to assess the biology associated with resistance to certain treatments  in breast cancer patients, and Flatiron Panoramic data to contextualize those insights and understand testing and treatment patterns as well as patient outcomes across a broader patient population.

Case study

Leveraging clinical-molecular data to understand mechanisms of resistance to CDK4/6i therapies in mBC

Cyclin-dependent kinase 4 and 6 inhibitors (CDK4/6i) in combination with endocrine therapy is the standard of care for first-line treatment of HR+/HER2- metastatic breast cancer in the US, and can delay progression compared to treatment with endocrine therapy alone. However, many patients’ cancer will eventually progress while receiving these therapies. 

Leveraging the Flatiron-Caris Life Sciences CMDB offers researchers an opportunity to take a deep dive  into the mechanisms that drive resistance to CDK4/6i-based therapies: access to real-world clinical outcomes like progression and response along with whole exome sequencing and whole transcriptome RNAseq data unlocks the potential to compare gene expression and frequency of genetic mutations between patients who progressed on CDK4/6i to those who did not. This can provide evidence about the role of pathways known or expected to drive resistance to CDK4/6i therapies, or surface relevant new genes or pathways, which can in turn inform clinical trial design or drug development. The depth of the linked data allows researchers to hone in on differences in mechanisms of resistance based on certain clinical or disease characteristics such as the presence of metastases in certain anatomical sites (e.g., the liver).

Panoramic data enables a broader view, and can provide reference data on patient demographic and clinical characteristics and real-world outcomes (like real-world progression-free survival) for patients with certain clinical or disease features and for certain treatment sequences regardless of whether their tumors were molecularly tested. This can contextualize findings from the analysis of the linked data.

Stronger together

In the study highlighted above, Panoramic data provides context and benchmarks, and may be used to describe testing patterns, while the multi-modal CMDB is used to assess the emergence and clinical relevance of genetic mutations and gene expression.

The prognostic significance of a novel biomarker can only be established if the tested patients cannot inherently be expected to have better (or worse) outcomes than the general patient population. A large, longitudinal, reference dataset like Flatiron’s Panoramic data can help researchers minimize any bias by comparing characteristics and outcomes among untested patients with tested patients undergoing the same treatment.

The bottom line is this: with real-world data, as with many other things in life, the whole is much greater than the sum of its parts. While our work with Caris Life Sciences, Foundation Medicine, and other key partners is only beginning, the value of taking a holistic approach to evidence generation built on the combined strength of best-in-class RWD datasets is already very clear.

To learn more about Flatiron's Evidence Solutions and how they can maximize use cases across your oncology portfolio, please reach out.

 

Share