Skip to content

Bigger, deeper, faster real-world data opens new horizons in hematologic oncology research

Published

December 2025

By

Dr. Ahmed Sawas, MD, Medical Director at Flatiron Health

Bigger, deeper, faster real-world data opens new horizons in hematologic oncology research

Therapeutic innovation in hematologic malignancies has accelerated dramatically over the past decade. What was once a landscape dominated by non-specific cytotoxic chemotherapy and limited biologic options has evolved into an era defined by precision immunotherapies and molecularly targeted agents. Breakthroughs such as CAR T-cell therapy, bispecific antibodies, and targeted agents like menin inhibitors, alongside  ultra-sensitive tools for detecting measurable residual disease (MRD), have fundamentally reshaped treatment paradigms across multiple blood cancers. 

In my clinical practice, where I specialize in lymphoid malignancies including aggressive and indolent B-cell lymphomas, chronic lymphocytic leukemia, and peripheral T-cell lymphomas, I’ve seen firsthand how these innovations have  translated into meaningful clinical benefit. Diseases that were once rapidly fatal are increasingly managed as chronic conditions, with improvements in depth of response, durability of remission, and overall survival. However, these gains have also introduced greater complexity in clinical decision making, with treatment strategies increasingly dependent on biomarkers, disease biology, prior lines of therapy, and patient-specific factors.

This evolving therapeutic landscape amplifies the importance of robust real-world evidence (RWE). As treatments become  more personalized, multimodal, and longitudinal, and patients journeys become more complex, the need for real-world data (RWD) with greater clinical depth, scale, and timeliness has grown. Capturing elements such as MRD status, CAR T utilization, and the nuances of sequencing targeted therapies is essential to fully understanding contemporary treatment patterns, adherence, and outcomes—especially for rare malignancies defined by specific genetic markers. Strengthening and expanding the real-world evidence ecosystem will position us to better support clinical research and accelerate access to emerging therapies.

That’s why it’s time for bigger, deeper, faster data.

Half a million patient records and counting

Flatiron has just released six new longitudinal datasets for hematology-oncology research: five for B-cell lymphomas and one for multiple myeloma. These datasets represent a six-fold increase in cohort sizes compared to previously available datasets, now delivering over 505K de-identified patients from US community and academic research centers, with international expansions already underway.

At Flatiron, we call our longitudinal, EHR-derived datasets ‘Panoramic data’ because their scale gives clinicians and researchers an unparalleled view of specific diseases across large, well-defined patient populations — but size alone has limited value. Hematologists treat individual patients, not broad population segments, and today’s precision-medicine environment demands more than just big data; it requires deep data. To study patients historically underrepresented in clinical research and address critical gaps in blood cancer research, researchers need the ability to zoom in with confidence on subcohorts defined by specific biomarkers or exposed to rare therapies, and to evaluate effectiveness and safety across a wide range of disease settings.

Breadth and depth are inherently complementary, and thanks to AI-powered extraction and data validation capabilities, we were able to develop our new hematology datasets so that researchers wouldn’t have to sacrifice one for the other.

How AI is helping us build credible and representative datasets

Electronic health records (EHR) present well-recognized challenges for secondary research use. Despite considerable progress over the past 10-15 years, most interesting EHR variables (diagnosis statuses and dates, biomarker testing results, treatment details, and most outcomes) still reside in unstructured documents like clinical narratives and pathology reports. AI, natural language processing, and machine learning algorithms are allowing us to extract that data at scale. But how does that automation impact quality?

At ESMO AI 2025, my Flatiron colleague Melissa Estevez presented on the VALID framework, a robust new framework designed to ensure that EHR data extracted by machine learning and large language models (LLMs) remains clinically plausible, comparable in quality to human-abstracted data, and able to replicate historical results. We used that framework to validate our new B-cell lymphoma and multiple myeloma datasets, and as we ramp up efforts to grow our fit-for-purpose hematology datasets, every new patient record goes through the same rigorous validation checks.

The result is a step change in our ability to generate real-world datasets that are simultaneously larger in scale, richer in clinical detail, and faster to deliver than prior approaches. This capability is essential in an era of rapid therapeutic innovation, enabling timely, high confidence evidence generation while maintaining the methodological rigor and data quality standards required for clinical, regulatory, and scientific use.

New global evidence is already making a big difference

What type of evidence are those new AI-powered datasets producing?

At ASH 2025, Flatiron presented 12 research papers spanning hematologic malignancies — from B-cell to mantle cell lymphoma, multiple myeloma, chronic lymphocytic leukemia, and B-cell acute lymphoblastic leukemia. These include:

  • Barriers and Bridges: Real-World CAR T Delivery Across US Oncology Practices
    We used Flatiron’s new hematology datasets to evaluate national patterns of CAR T utilization, referral dynamics, and access barriers across academic and community practices in the US. In particular, we analyzed responses like cytokine release syndrome, immune effector cell-associated neurotoxicity syndrome, and healthcare burden, allowing us to address critical gaps in our understanding of how this transformative therapy is implemented across care settings.
  • What Remains Matters: Real-World Impact of Measurable Residual Disease Testing in Multiple Myeloma
    MRD offers meaningful insights into disease status, treatment response, and prognosis. Initially adopted in clinical trials, MRD is being explored to guide treatment decisions in real-world settings, but data on real-world adoption, testing patterns, impact on treatment course, and outcomes remains limited. By examining 12,000 patients with MM in the US, we were able to confirm that MRD-negativity was strongly correlated to improved patient outcomes (like rwTTNT and rwOS) and provide tangible recommendations for MRD to inform treatment duration and intensity.
  • Real-world insights into diffuse large B-cell lymphoma from EHR-derived data in Germany and the United Kingdom
    DLBCL is a common and aggressive form of lymphoma, and new treatment options (in first as well as later lines) are evolving rapidly around the world. To understand regional variations in patients and care, we examined 1,000 patients in Germany and the UK and uncovered important differences in therapeutic approaches, including treatment regimens and biomarker testing. This research demonstrates the benefits of high-quality, multinational real-world evidence and is paving the way for stronger international collaboration
Looking ahead

By uniting AI-powered data extraction with rigorous, fit-for-purpose  data validation, Flatiron is building a next-generation evidence infrastructure designed to meet the growing complexity of research in hematologic oncology. This foundation enables real-world evidence that evolves alongside scientific innovation–supporting more nuanced questions, faster insight generation, and new frontiers in blood cancer research. Today, this new class  of real-world data is already empowering hematologists across a broad range of research objectives:

  • Study rare cohorts and large cohorts in more details
  • Support emerging therapies (like trispecific antibodies)
  • Set up external comparator cohorts to accelerate trials
  • Bridge the gap between academic and community care
  • Compare treatment patterns across health systems
  • Foster collaboration across industry and HTA bodies
  • Uncover real-world inequities
  • Inform practice and policy

Most importantly, it’s showing that RWD can be used to look ahead, not just backwards. Thanks to the scale, depth, and speed of the data in our new Panoramic datasets, hematology researchers can now generate synthetic controls, digital twins, and predictive models grounded in deeply phenotyped populations and biologically meaningful endpoints like MRD; life science companies can accelerate trials and approval; HEOR specialists can demonstrate equity gaps; and clinicians can offer more effective and personalized treatment options to their patients.

Beyond clinical endpoints, these data enable systematic insight into physician and patient experiences with therapy in routine practice, including perceptions, practical barriers, and real-world tolerability. Capturing this contextual signal from clinical documentation is essential for understanding adoption patterns and aligning evidence generation with the realities of care delivery.

We are entering a new era in hematologic research–one that brings us closer to the central promise of real-world evidence: that every patient’s lived clinical experience can systematically inform and improve outcomes for the patients who follow.

To learn more about how real-world evidence can support your research priorities, reach out to us.

Share