Evaluating the severity of ulcerative colitis (UC) through the Mayo endoscopic subscore (MES) is crucial for understanding patient conditions and providing effective treatment. However, UC lesions present different characteristics in endoscopic images, exacerbating interclass similarities and intraclass differences in MES classification. In addition, inexperience and review fatigue in endoscopists introduces nontrivial challenges to the reliability and repeatability of MES evaluations. In this paper, we propose a pyramid hybrid feature fusion framework (PHF3) as an auxiliary diagnostic tool for clinical UC severity classification. Specifically, the PHF3 model has a dual-branch hybrid architecture with ResNet50 and a pyramid vision Transformer (PvT), where the local features extracted by ResNet50 represent the relationship between the intestinal wall at the near-shot point and its depth, and the global representations modeled by the PvT capture similar information in the cross-section of the intestinal cavity. Furthermore, a feature fusion module (FFM) is designed to combine local features with global representations, while second-order pooling (SOP) is applied to enhance discriminative information in the classification process. The experimental results show that, compared with existing methods, the proposed PHF3 model has competitive performance. The area under the receiver operating characteristic curve (AUC) of MES 0, MES 1, MES 2, and MES 3 reached 0.996, 0.972, 0.967, and 0.990, respectively, and the overall accuracy reached 88.91%. Thus, our proposed method is valuable for developing an auxiliary assessment system for UC severity.