“…This article compares a series of Convolutional Neural Networks (CNNs), such as ResNet-18, 34, 50, 101 (He et al, 2016 ), VGG11, 13, 16, 19 (Simonyan and Zisserman, 2014 ), DenseNet-121, 169 (Huang et al, 2017 ), Inception-V3 (Szegedy et al, 2016 ), Xception (Chollet, 2017 ), AlexNet (Krizhevsky et al, 2012 ), GoogleNet (Szegedy et al, 2015 ), MobileNet-V2 (Sandler et al, 2018 ), ShuffeleNet-V2x0.5 (Ma et al, 2018 ), Inception-ResNet-V1 (Szegedy et al, 2017 ), and a series of visual transformers (VTs), such as vision transformer (ViT) (Dosovitskiy et al, 2020 ), BotNet (Srinivas et al, 2021 ), DeiT (Touvron et al, 2020 ), T2T-ViT (Yuan et al, 2021 ). The purpose is to find deep learning models that are suitable for EM small datasets.…”