In response to the critical need for advanced solutions in medical imaging segmentation, particularly for real-time applications in diagnostics and treatment planning, this study introduces SM-UNet. This novel deep learning architecture efficiently addresses the challenge of real-time, accurate medical image segmentation by integrating convolutional neural network (CNN) with multilayer perceptron (MLP). The architecture uniquely combines an initial convolutional encoder for detailed feature extraction, MLP module for capturing long-range dependencies, and a decoder that merges global features with high-resolution CNN map. Further optimization is achieved through a tokenization approach, significantly reducing computational demands. Its superior performance is confirmed by evaluations on standard datasets, showing interaction times drastically lower than comparable networks—between 1/6 to 1/10, and 1/25 compared to SOTA models. These advancements underscore SM-UNet's potential as a groundbreaking tool for facilitating real-time, precise medical diagnostics and treatment strategies.