This episode explores a study from Radiology Advances tackling one of AI's toughest challenges in medical imaging: consistent pancreas segmentation across CT scans. The authors benchmarked multiple models against multi-reader human consensus and introduced a new metric, Fractional Threshold (FT), to measure robustness. Their human-in-the-loop workflow flagged just 5% of cases for expert review, matching human reliability while cutting annotation time 23-fold.
Benchmarking Robustness of Automated CT Pancreas Segmentation: Achieving Human-Level Reliability Through Human-in-the-Loop Optimization. Oviedo et al. Radiology Advances, Volume 2, Issue 6, November 2025, umaf040,