At present, convolutional neural networks (CNNs) have been widely applied to the task of skin disease image segmentation due to the fact of their powerful information discrimination abilities and have achieved good results. However, it is difficult for CNNs to capture the connection between long-range contexts when extracting deep semantic features of lesion images, and the resulting semantic gap leads to the problem of segmentation blur in skin lesion image segmentation. In order to solve the above problems, we designed a hybrid encoder network based on transformer and fully connected neural network (MLP) architecture, and we call this approach HMT-Net. In the HMT-Net network, we use the attention mechanism of the CTrans module to learn the global relevance of the feature map to improve the network’s ability to understand the overall foreground information of the lesion. On the other hand, we use the TokMLP module to effectively enhance the network’s ability to learn the boundary features of lesion images. In the TokMLP module, the tokenized MLP axial displacement operation strengthens the connection between pixels to facilitate the extraction of local feature information by our network. In order to verify the superiority of our network in segmentation tasks, we conducted extensive experiments on the proposed HMT-Net network and several newly proposed Transformer and MLP networks on three public datasets (ISIC2018, ISBI2017, and ISBI2016) and obtained the following results. Our method achieves 82.39%, 75.53%, and 83.98% on the Dice index and 89.35%, 84.93%, and 91.33% on the IOU. Compared with the latest skin disease segmentation network, FAC-Net, our method improves the Dice index by 1.99%, 1.68%, and 1.6%, respectively. In addition, the IOU indicators have increased by 0.45%, 2.36%, and 1.13%, respectively. The experimental results show that our designed HMT-Net achieves state-of-the-art performance superior to other segmentation methods.