Cervical cancer is the fourth most common cancer in the world. Whole-slide images (WSIs) are an important standard for the diagnosis of cervical cancer. Missed diagnoses and misdiagnoses often occur due to the high similarity in pathological cervical images, the large number of readings, the long reading time, and the insufficient experience levels of pathologists. Existing models have insufficient feature extraction and representation capabilities, and they suffer from insufficient pathological classification. Therefore, this work first designs an image processing algorithm for data augmentation. Second, the deep convolutional features are extracted by fine-tuning pre-trained deep network models, including ResNet50 v2, DenseNet121, Inception v3, VGGNet19, and Inception-ResNet, and then local binary patterns and a histogram of the oriented gradient to extract traditional image features are used. Third, the features extracted by the fine-tuned models are serially fused according to the feature representation ability parameters and the accuracy of multiple experiments proposed in this paper, and spectral embedding is used for dimension reduction. Finally, the fused features are inputted into the Analysis of Variance-F value-Spectral Embedding Net (AF-SENet) for classification. There are four different pathological images of the dataset: normal, low-grade squamous intraepithelial lesion (LSIL), high-grade squamous intraepithelial lesion (HSIL), and cancer. The dataset is divided into a training set (90%) and a test set (10%). The serial fusion effect of the deep features extracted by Resnet50v2 and DenseNet121 (C5) is the best, with average classification accuracy reaching 95.33%, which is 1.07% higher than ResNet50 v2 and 1.05% higher than DenseNet121. The recognition ability is significantly improved, especially in LSIL, reaching 90.89%, which is 2.88% higher than ResNet50 v2 and 2.1% higher than DenseNet121. Thus, this method significantly improves the accuracy and generalization ability of pathological cervical WSI recognition by fusing deep features.