The growth of the internet has advanced information-sharing capabilities and vastly increased the importance of global network security. However, because new and inconspicuous abnormal behaviors are nearly impossible to detect in massive network access environments, modern intrusion detection systems have identified a high rate of false-positive (FP) and false-negative (FN) attacks. To overcome this, this paper proposes a hybrid deep learning model that significantly mitigates the disadvantages of consistently imbalanced sample attack data. First, it resolves imbalanced data using random undersampling and synthetic minority oversampling techniques. Then, convolutional neural networks (CNNs) extract local and spatial features, and a transformer encoder extracts global and temporal features. The novelty of this combination increases recognition accuracy at the algorithm level, which is crucial to reducing FPs and FNs. The model was subjected to multiclassification testing on the NSL-KDD and CICIDS2017 benchmark datasets, and the results show that our model has higher classification accuracy and lower FP rates than state-ofthe-art intrusion detection models. Moreover, it significantly improves the detection rate of low-frequency attacks.