To address the challenges faced in industrial anomaly detection, including data sample imbalance, lack of anomaly labels, and complex spatiotemporal relationships in high-dimensional data, this paper proposes a novel multi-modal time-series anomaly detection model that combines attention mechanisms and adversarial training. In this model, the first step involves utilizing graph attention mechanisms to extract sequence correlation features from multi-modal time-series data, which are then summed with the original data to form a dual-feature-based data representation. Subsequently, a self-supervised learning approach is employed to input this data representation into a variational autoencoder's encoding-decoding network for reconstruction. Anomaly detection is performed by analyzing the error between the input and reconstructed data. The model also employs spatiotemporal attention mechanisms and adversarial training during reconstruction to enhance feature extraction and model generalization. By comparing our proposed model to five commonly used baseline models, we demonstrate its effectiveness in detecting anomalies in scenarios involving high-dimensional data and imbalanced abnormal samples, demonstrating superior anomaly detection performance, as well as excellent performance on real industrial production and processing datasets.