Infrared camera trapping, which helps capture large volumes of wildlife images, is a widely-used, non-intrusive monitoring method in wildlife surveillance. This method can greatly reduce the workload of zoologists through automatic image identification. To achieve higher accuracy in wildlife recognition, the integrated model based on multi-branch aggregation and Squeeze-and-Excitation network is introduced. This model adopts multi-branch aggregation transformation to extract features, and uses Squeeze-and-Excitation block to adaptively recalibrate channel-wise feature responses based on explicit self-mapped interdependencies between channels. The efficacy of the integrated model is tested on two datasets: the Snapshot Serengeti dataset and our own dataset. From experimental results on the Snapshot Serengeti dataset, the integrated model applies to the recognition of 26 wildlife species, with the highest accuracies in Top-1 (when the correct class is the most probable class) and Top-5 (when the correct class is within the five most probable classes) at 95.3% and 98.8%, respectively. Compared with the ROI-CNN algorithm and ResNet (Deep Residual Network), on our own dataset, the integrated model, shows a maximum improvement of 4.4% in recognition accuracy.