Purpose:To evaluate the interobserver variability in descriptions of breast masses by dedicated breast imagers and radiology residents and determine how any differences in lesion description affect the performance of a computer-aided diagnosis (CAD) computer classifi cation system.
Materials and Methods:Institutional review board approval was obtained for this HIPAA-compliant study, and the requirement to obtain informed consent was waived. Images of 50 breast lesions were individually interpreted by seven dedicated breast imagers and 10 radiology residents, yielding 850 lesion interpretations. Lesions were described with use of 11 descriptors from the Breast Imaging Reporting and Data System, and interobserver variability was calculated with the Cohen k statistic. Those 11 features were selected, along with patient age, and merged together by a linear discriminant analysis (LDA) classifi cation model trained by using 1005 previously existing cases. Variability in the recommendations of the computer model for different observers was also calculated with the Cohen k statistic.
Results:A signifi cant difference was observed for six lesion features, and radiology residents had greater interobserver variability in their selection of fi ve of the six features than did dedicated breast imagers. The LDA model accurately classifi ed lesions for both sets of observers (area under the receiver operating characteristic curve = 0.94 for residents and 0.96 for dedicated imagers). Sensitivity was maintained at 100% for residents and improved from 98% to 100% for dedicated breast imagers. For residents, the computer model could potentially improve the specifi city from 20% to 40% ( P , .01) and the k value from 0.09 to 0.53 ( P , .001). For dedicated breast imagers, the computer model could increase the specifi city from 34% to 43% ( P = .16) and the k value from 0.21 to 0.61 ( P , .001).
Conclusion:Among fi ndings showing a signifi cant difference, there was greater interobserver variability in lesion descriptions among residents; however, an LDA model using data from either dedicated breast imagers or residents yielded a consistently high performance in the differentiation of benign from malignant breast lesions, demonstrating potential for improving specifi city and decreasing interobserver variability in biopsy recommendations.q RSNA, 2010 1