Overview
Accurately capturing patient performance status is essential for oncology research and clinical decision-making, yet in Japan, the Eastern Cooperative Oncology Group (ECOG) performance status is typically recorded only in unstructured clinical notes, making data collection challenging and resource-intensive.
This study evaluated whether large language models (LLMs) could automatically extract ECOG performance status information from Japanese electronic health records, potentially unlocking scalable real-world data curation capabilities. When compared to manual human abstraction, the LLM achieved 100% sensitivity and 100% precision in identifying performance status values. Notably, in cases where human reviewers missed information, the model identified performance status in approximately 50% of those records. The computational cost of using this automated approach was estimated at less than 5% of the cost for traditional manual data abstraction.
Why this matters
This research demonstrates that LLM-based extraction of ECOG performance status from Japanese clinical notes is both highly accurate and cost-effective. By automating this process, researchers can now scale real-world data collection across Japan more efficiently, enabling richer longitudinal datasets for cancer research. Future advancements on this approach in Japan will build on Flatiron’s focused efforts to validate LLM-extracted real-world data globally, including publications like the VALID Framework—the industry's first comprehensive approach to evaluating AI-extracted real-world data. Finally, this work supports international research collaboration by creating consistent, harmonized clinical data across countries—ultimately improving our ability to conduct meaningful real-world evidence studies that inform treatment decisions for cancer patients globally.