A framework for evaluating clinical artificial intelligence systems without ground-truth annotations

Our summary

Cancer care and research are heavily dependent on the availability of high quality patient data, such as demographic information (race/ethnicity) and the health status of the patient (ECOG Performance status). However, in many electronic health records (EHRs), such information may be missing, limiting how confidently a question can be answered, or if it can even be answered at all.

Through a recent study published in Nature Communications, researchers at Cedars-Sinai Medical Center and Flatiron Health addressed this challenge by developing an artificial intelligence (AI) system that leverages patients’ clinical notes to infer their missing EHR information. Notably, it is imperative to assess the quality of such newly-inferred information, but the team found that doing so was hampered by the absence of ground-truth annotations. To address this, a general framework (SUDO) was introduced that enables researchers to quantify the performance of AI models when deployed on data without ground-truth labels. This framework was demonstrated to be applicable across multiple data modalities, including dermatology images, histopathology patches, and clinical notes. The hope is that SUDO can contribute to the deployment of trustworthy AI systems, positively influencing the way cancer care and research are conducted.

Why this matters

The SUDO framework shows promise in assessing the reliability of AI predictions on data lacking ground-truth labels. As AI is increasingly used for information extraction and to facilitate cancer research and care, it is essential that we understand how these models perform and ensure they are working fairly across different patient groups. SUDO can be used to better identify unreliable predictions and help with model selection as well as evaluating for algorithmic bias. While there is still a need for further exploration of SUDO's robustness and scalability in diverse scenarios, this approach is a promising step toward more informed and ethical AI deployment.

Read the research

Publications

A framework for evaluating clinical artificial intelligence systems without ground-truth annotations

Our summary

Why this matters

Share

Posted in

More publications

AACR Special Conference in Cancer Research: Artificial Intelligence and Machine Learning

July 2025

Using large language models for scalable extraction of real-world progression events across multiple cancer types

Cohen A, Krismer K, Magee K, et al.

arXiv

June 2025

Ensuring reliability of curated EHR-derived data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework

Estevez M, Singh N, Dyson L, et al.

ASCO Annual Meeting

May 2025

Concordance of response-based clinical trial and machine learning–generated real-world end points

Zhang Q, Krismer K, Lu Y, et al.