Purpose
This study aimed to compare the cost-effectiveness of AI-based approaches with manual approaches in ultrasound image quality control (QC).
Methods
Eligible ultrasonographers and pregnant volunteers were prospectively recruited from the Hunan Maternal and Child Health Hospital in May 2023. The ultrasonographers were randomly and evenly assigned to either the AI or Manual QC groups with baseline scores determined in June-July. From August to October, these groups received real-time AI or post-scan manual QC with post-interventional scores recorded monthly. We applied the repeated measures analysis of variance to analyze the between-subject and within-subject effectiveness and time trends in effectiveness (QC score improvement) assessment. An extra 50 pregnant volunteers underwent real-time manual QC, with their screening images utilized for post-scan AI and manual QC. The time cost of real-time AI QC was zero since it only required trainees’ involvement. We used Friedman’s
M
and Quade tests to compare multiple independent medians in cost assessment.
Results
This study recruited 14 ultrasonographers, equally divided into the AI and Manual QC groups. No significant difference existed between the groups concerning age, service year in perinatal diagnosis, male proportion, and QC frequency. The simple effect of the group revealed that the AI QC method outperformed the Manual QC method at least once (
F
= 13.113,
P
= 0.004,
η
2
= 0.522). The simple effect of the month in the AI QC groups indicated an improvement in the mean QC scores (
F
= 9.827,
P
= 0.003,
η
2
= 0.747) while that of manual QC groups suggested no improvement (
F
= 0.144,
P
= 0.931,
η
2
= 0.041). Baseline scores were equal in June-July (
F
= 0.031,
P
= 0.864,
η
2
= 0.003). However, the AI QC group surpassed the Manual QC group in August (
F
= 14.579,
P
= 0.002,
η
2
= 0.549), September (
F
= 28.590,
P
< 0.001,
η
2
= 0.704), and October (
F
= 35.411,
P
< 0.001,
η
2
= 0.747). Within the Manual QC group, no significant differences were found in scores between June-July and August, September, or October (all
P
values of 1.000, nominal significance level of 0.0083). In contrast, the AI QC group showed significantly higher scores in August, September, and October co...