The dark web was often utilized for illicit activities, data breaches, and the dissemination of malicious software. Researchers consistently employed various machine learning and deep learning approaches to detect dark web traffic. However, existing studies overlooked the comprehensive capture of multi-scale information in traffic data, resulting in an inability to fully extract features when dealing with complex structural data, especially in datasets with an imbalanced number of samples. To address this problem, our paper proposed DarkGuardNet for the recognition of dark web traffic and application type classification. Specifically, we conducted dark web traffic analysis based on sessions and designed a Spatio-temporal Feature Fusion (STFF) module to capture multi-scale feature correlations. This module extended the receptive field to deepen the understanding of complex data, allowing for the precise extraction of spatiotemporal features in imbalanced samples. In addition, we used Multi-Head Self-Attention (MHSA) to mine potential relationships between statistical features of dark web traffic, enabling the model to focus on key features of categories with small sample sizes. Finally, we conducted experiments on a new imbalanced dark web traffic dataset, formed by merging ISCXVPN and ISCXTor. The results indicated that the method achieved an accuracy of 0.999 in dark web traffic recognition and an accuracy of 0.986 in application type classification, surpassing other advanced methods. The Data is available at:https://github.com/niu954325618/Darknet2024/tree/main.