Purpose
Ultrasound image segmentation is a challenging task due to a low signal‐to‐noise ratio and poor image quality. Although several approaches based on the convolutional neural network (CNN) have been applied to ultrasound image segmentation, they have weak generalization ability. We propose an end‐to‐end, multiple‐channel and atrous CNN designed to extract a greater amount of semantic information for segmentation of ultrasound images.
Method
A multiple‐channel and atrous convolution network is developed, referred to as MA‐Net. Similar to U‐Net, MA‐Net is based on an encoder–decoder architecture and includes five modules: the encoder, atrous convolution, pyramid pooling, decoder, and residual skip pathway modules. In the encoder module, we aim to capture more information with multiple‐channel convolution and use large kernel convolution instead of small filters in each convolution operation. In the last layer, atrous convolution and pyramid pooling are used to extract multi‐scale features. The architecture of the decoder is similar to that of the encoder module, except that up‐sampling is used instead of down‐sampling. Furthermore, the residual skip pathway module connects the subnetworks of the encoder and decoder to optimize learning from the deeper layer and improve the accuracy of segmentation. During the learning process, we adopt multi‐task learning to enhance segmentation performance. Five types of datasets are used in our experiments. Because the original training data are limited, we apply data augmentation (e.g., horizontal and vertical flipping, random rotations, and random scaling) to our training data. We use the Dice score, precision, recall, Hausdorff distance (HD), average symmetric surface distance (ASD), and root mean square symmetric surface distance (RMSD) as the metrics for segmentation evaluation. Meanwhile, Friedman test was performed as the nonparametric statistical analysis to evaluate the algorithms.
Results
For the datasets of brachia plexus (BP), fetal head, and lymph node segmentations, MA‐Net achieved average Dice scores of 0.776, 0.973, and 0.858, respectively; with average precisions of 0.787, 0.968, and 0.854, respectively; average recalls of 0.788, 0.978, and 0.885, respectively; average HDs (mm) of 13.591, 10.924, and 19.245, respectively; average ASDs (mm) of 4.822, 4.152, and 4.312, respectively; and average RMSDs (mm) of 4.979, 4.161, and 4.930, respectively. Compared with U‐Net, U‐Net++, M‐Net, and Dilated U‐Net, the average performance of the MA‐Net increased by approximately 5.68%, 2.85%, 6.59%, 36.03%, 23.64%, and 31.71% for Dice, precision, recall, HD, ASD, and RMSD, respectively. Moreover, we verified the generalization of MA‐Net segmentation to lower grade brain glioma MRI and lung CT images. In addition, the MA‐Net achieved the highest mean rank in the Friedman test.
Conclusion
The proposed MA‐Net accurately segments ultrasound images with high generalization, and therefore, it offers a useful tool for diagnostic application in ultrasound images.