Network traffic is a crucial indicator of network performance and network intrusions typically result in traffic anomalies. Capturing the differences and commonalities between different input features is challenging due to high-dimensional traffic data. To address this, we propose a multi-scale feature extraction method based on global additive attention (MSFE-GAA), which integrates time position information encoded by trigonometric functions to capture multi-scale temporal features. An improved Transformer with a similarity matrix captures the commonalities and differences, enhanced by global additive attention for long-term dependencies. Experiments on two public datasets show that the MSFE-GAA model outperforms other baseline models.