BackgroundBreast tumor is a fatal threat to the health of women. Ultrasound (US) is a common and economical method for the diagnosis of breast cancer. Breast imaging reporting and data system (BI‐RADS) category 4 has the highest false‐positive value of about 30% among five categories. The classification task in BI‐RADS category 4 is challenging and has not been fully studied.PurposeThis work aimed to use convolutional neural networks (CNNs) for breast tumor classification using B‐mode images in category 4 to overcome the dependence on operator and artifacts. Additionally, this work intends to take full advantage of morphological and textural features in breast tumor US images to improve classification accuracy.MethodsFirst, original US images coming directly from the hospital were cropped and resized. In 1385 B‐mode US BI‐RADS category 4 images, the biopsy eliminated 503 samples of benign tumor and left 882 of malignant. Then, K‐means clustering algorithm and entropy of sliding windows of US images were conducted. Considering the diversity of different characteristic information of malignant and benign represented by original B‐mode images, K‐means clustering images and entropy images, they are fused in a three‐channel form multi‐feature fusion images dataset. The training, validation, and test sets are 969, 277, and 139. With transfer learning, 11 CNN models including DenseNet and ResNet were investigated. Finally, by comparing accuracy, precision, recall, F1‐score, and area under curve (AUC) of the results, models which had better performance were selected. The normality of data was assessed by Shapiro‐Wilk test. DeLong test and independent t‐test were used to evaluate the significant difference of AUC and other values. False discovery rate was utilized to ultimately evaluate the advantages of CNN with highest evaluation metrics. In addition, the study of anti‐log compression was conducted but no improvement has shown in CNNs classification results.ResultsWith multi‐feature fusion images, DenseNet121 has highest accuracy of 80.22 ± 1.45% compared to other CNNs, precision of 77.97 ± 2.89% and AUC of 0.82 ± 0.01. Multi‐feature fusion improved accuracy of DenseNet121 by 1.87% from classification of original B‐mode images (p < 0.05).ConclusionThe CNNs with multi‐feature fusion show a good potential of reducing the false‐positive rate within category 4. The work illustrated that CNNs and fusion images have the potential to reduce false‐positive rate in breast tumor within US BI‐RADS category 4, and make the diagnosis of category 4 breast tumors to be more accurate and precise.