Gastrointestinal (GI) disorders, encompassing conditions like cancer and Crohn’s disease, pose a significant threat to public health. Endoscopic examinations have become crucial for diagnosing and treating these disorders efficiently. However, the subjective nature of manual evaluations by gastroenterologists can lead to potential errors in disease classification. In addition, the difficulty of diagnosing diseased tissues in GI and the high similarity between classes made the subject a difficult area. Automated classification systems that use artificial intelligence to solve these problems have gained traction. Automatic detection of diseases in medical images greatly benefits in the diagnosis of diseases and reduces the time of disease detection. In this study, we suggested a new architecture to enable research on computer-assisted diagnosis and automated disease detection in GI diseases. This architecture, called Spatial-Attention ConvMixer (SAC), further developed the patch extraction technique used as the basis of the ConvMixer architecture with a spatial attention mechanism (SAM). The SAM enables the network to concentrate selectively on the most informative areas, assigning importance to each spatial location within the feature maps. We employ the Kvasir dataset to assess the accuracy of classifying GI illnesses using the SAC architecture. We compare our architecture’s results with Vanilla ViT, Swin Transformer, ConvMixer, MLPMixer, ResNet50, and SqueezeNet models. Our SAC method gets 93.37% accuracy, while the other architectures get respectively 79.52%, 74.52%, 92.48%, 63.04%, 87.44%, and 85.59%. The proposed spatial attention block improves the accuracy of the ConvMixer architecture on the Kvasir, outperforming the state-of-the-art methods with an accuracy rate of 93.37%.