In today's network environments vulnerable to cyber threats like hackers and viruses, intrusion detection technology is considered the most effective means of detection and defense. Deep neural networks are commonly used in intrusion detection technology. However, improving the model's ability to extract feature information and reducing computational space while retaining local feature information are critical challenges that need to be addressed.To tackle these issues, this paper proposes a model named BBO-CFAT, which combines the Biogeography-Based Optimization algorithm (BBO) for feature selection and an improved Transformer model for preserving context information and reducing computational space. Specifically, the BBO-CFAT model employs a roulette selection method to control the operations of migration and mutation operators. It utilizes feature information entropy to weight updates of adaptive variables in these operators, thereby enhancing the credibility of feature selection. Furthermore, the Transformer framework is hierarchically designed to facilitate the acquisition of context information. Additionally, depthwise separable convolutions are employed to reduce computational space, thereby improving computational efficiency and training speed.Experimental evaluations using the CIC-IDS2017 and NSL-KDD datasets demonstrate promising accuracies for BBO-CFAT on both datasets, achieving 99.1% and 97.5% accuracy, respectively, surpassing the performance of comparative experiments.Overall, the BBO-CFAT model provides a comprehensive solution to the challenges of intrusion detection, effectively balancing feature preservation, computational efficiency, and training accuracy.