Objectives: Few studies have evaluated and compared artificial intelligence (AI) models for breast cancer detection on the same database. The purpose is to compare the performances of 3 artificial intelligence algorithms on the same dataset at a free operative threshold and to evaluate how the performance is impacted by the chosen threshold.
Materials: In this retrospective single-center study, a dataset of 314 2D screening mammograms performed between 2012 and 2020 was established with a prevalence of 19.6% of histologically proven cancer. Three AI constructors using CE (European conformity) marking algorithms have agreed to take part in the study by submitting the cancer-enriched cohort to their algorithm. They chose a free operative threshold to distinguish benign and malignant mammograms. The most suspicious lesion was marked and correctly located by the AI algorithm. Statistical parameters were analyzed and compared between the algorithms.
Results: Regarding both sensitivity and specificity, at the chosen threshold, AI 1 had the best compromise between sensitivity (74%) and specificity (79%). AI 2 had a statistically lower sensitivity (52%; p<0.05) with a higher specificity (98.4%) than the other AI algorithms. AI 3 had a nonsignificant sensitivity difference (69.9%) but a significantly lower specificity (45.3%; p<0.001) than the other two.
The performances varied depending on the chosen threshold; when the AI 2 threshold was lowered, the sensitivity increased (69.9%), while a higher specificity (86.1%) was maintained.
Conclusion: Two AI algorithms stood out in terms of performance when the threshold was optimized, resulting in an acceptable sensitivity and specificity.