Automatic classification of dermatological images is an important technology that assists doctors in achieving faster and more accurate classification of skin diseases. Recently, convolutional neural networks (CNNs) and Transformer networks have been employed in learning respectively the local and global features of lesion images. However, existing works mainly focus on utilizing a single neural network for feature extraction, which limits the model classification performance. In order to tackle this problem, a novel fusion model, named ConvNeXt-ST-AFF, is proposed in this paper, by combining the strengths of ConvNeXt and Swin Transformer (ConvNeXt-ST in the model name). In the proposed model, the pretrained ConvNeXt and Swin Transformer networks extract local and global features from images, which are then fused using Attentional Feature Fusion (AFF) modules (AFF in the model name). Additionally, in order to enhance the model's attention on the regions of skin lesions during training, an Efficient Channel Attention (ECA) module is incorporated into the ConvNeXt network. Moreover, the proposed model employs a denoising module to reduce the influence of artifacts and improve the image contrast. The results, obtained by experiments conducted on two datasets, demonstrate that the proposed ConvNeXt-ST-AFF model has higher classification ability, based on multiple evaluation metrics, compared to the original ConvNeXt and Swin Transformer, and other state-of-the-art classification models.