“…With the most recent advancements in machine learning, numerous deep learning-based techniques, including convolutional neural network (CNN), pre-trained deep CNN networks [17], like Alexnet, VGG 16, VGG 19, ResNet 50 [18], MobileNet [19], multimodal fusion with CoaT (coat-lite-small), PiT (pooling based vision transformer pits-distilled-224), ViT (vision transformer small-patch16-384), ResNetV2 and ResNetY [20], and concatenated models of VGG 16, Inception V3 [21], have been proposed for the automated extraction of morphological features. After the feature extraction, the images were classified into normal and OSCC categories using different classifiers such as random forest [22], support vector machine (SVM) [10], extreme gradient boosting (XGBoost) with binary particle swarm optimization (BPSO) feature selection [23], K nearest neighbor (KNN) [10], duck patch optimization based deep learning method [24] and two pretrained models, ResNet 50 and DenseNet 201 [11]. However, as the number of layers of the network increases, the complexity also will increase.…”