Background:
The prevalence of nonalcoholic fatty liver disease is increasing over time worldwide, with similar trends to those of diabetes and obesity. A liver biopsy, the gold standard of diagnosis, is not favored due to its invasiveness. Meanwhile, noninvasive evaluation methods of fatty liver are still either very expensive or demonstrate poor diagnostic performances, thus, limiting their applications. We developed neural network–based models to assess fatty liver and classify the severity using B-mode ultrasound (US) images.
Methods:
We followed standards for reporting of diagnostic accuracy guidelines to report this study. In this retrospective study, we utilized B-mode US images from a consecutive series of patients to develop four-class, two-class, and three-class diagnostic prediction models. The images were eligible if confirmed by at least two gastroenterologists. We compared pretrained convolutional neural network models, consisting of visual geometry group (VGG)19, ResNet-50 v2, MobileNet v2, Xception, and Inception v2. For validation, we utilized 20% of the dataset resulting in >100 images for each severity category.
Results:
There were 21,855 images from 2,070 patients classified as normal (N = 11,307), mild (N = 4,467), moderate (N = 3,155), or severe steatosis (N = 2,926). We used ResNet-50 v2 for the final model as the best ones. The areas under the receiver operating characteristic curves were 0.974 (mild steatosis vs others), 0.971 (moderate steatosis vs others), 0.981 (severe steatosis vs others), 0.985 (any severity vs normal), and 0.996 (moderate-to-severe steatosis/clinically abnormal vs normal-to-mild steatosis/clinically normal).
Conclusion:
Our deep learning models achieved comparable predictive performances to the most accurate, yet expensive, noninvasive diagnostic methods for fatty liver. Because of the discriminative ability, including for mild steatosis, significant impacts on clinical applications for fatty liver are expected. However, we need to overcome machine-dependent variation, motion artifacts, lacking of second confirmation from any other tools, and hospital-dependent regional bias.