Skip to content

A framework for evaluating clinical artificial intelligence systems without ground-truth annotations


March 2024


Kiyasseh, D., Cohen, A., Jiang, C. et al. A framework for evaluating clinical artificial intelligence systems without ground-truth annotations. Nat Commun 15, 1808 (2024). 

Our summary

Cancer care and research are heavily dependent on the availability of high quality patient data, such as demographic information (race/ethnicity) and the health status of the patient (ECOG Performance status). However, in many electronic health records (EHRs), such information may be missing, limiting how confidently a question can be answered, or if it can even be answered at all.

Through a recent study published in Nature Communications, researchers at Cedars-Sinai Medical Center and Flatiron Health addressed this challenge by developing an artificial intelligence (AI) system that leverages patients’ clinical notes to infer their missing EHR information. Notably, it is imperative to assess the quality of such newly-inferred information, but the team  found that doing so was hampered by the absence of ground-truth annotations. To address this, a general framework (SUDO) was introduced that enables researchers to quantify the performance of AI models when deployed on data without ground-truth labels. This framework was demonstrated to be applicable across multiple data modalities, including dermatology images, histopathology patches, and clinical notes. The hope is that SUDO can contribute to the deployment of trustworthy AI systems, positively influencing the way cancer care and research are conducted.

Why this matters

The SUDO framework shows promise in assessing the reliability of AI predictions on data lacking ground-truth labels. As AI is increasingly used for information extraction and to facilitate cancer research and care, it is essential that we understand how these models  perform and ensure they are working fairly across different patient groups. SUDO can be used to better identify unreliable predictions and help with model selection as well as evaluating for algorithmic bias. While there is still a need for further exploration of SUDO's robustness and scalability in diverse scenarios, this approach is a promising step toward more informed and ethical AI deployment.

Read the research