Cervical cancer poses a health concern for women globally ranking as the seventh most common disease and the fourth most frequent cancer among women. The classification of cytopathology images is utilized in diagnosing this condition with a focus on automating the process due to potential human errors in manual examinations. This study presents an approach that integrates transfer learning, ensemble learning and a transformer encoder to classify cervical cancer using pap-smear images from the SIPaKMeD dataset. By combining these methods human involvement in the classification task is minimized. Initially individual models based on transfer learning are. Their unique characteristics are combined to create an ensemble model. This ensemble model is then input into the proposed transformer encoder specifically utilizing the Vision Transformer (ViT) model. The results highlight the effectiveness of this methodology. The VGG16 model demonstrates accuracy of 97.04% and an F1 score of 97.06% when applied to classifying five categories using the SIPaKMeD dataset. However surpassing this performance is the learning model, with an accuracy of 97.37%. Notably outperforming all models is the transformer encoder model achieving an accuracy of 97.54%.Through the utilization of transfer learning, ensemble learning and the transformer encoder model this research introduces a method, for automating the classification of cervical cancer. The findings underscore the capability of the suggested approach to enhance the precision and effectiveness of diagnosing cancer.