Advanced Persistent Threat attacks(APT) are targeted attacks launched by professional hacker organizations using advanced techniques, resulting in significant harm. Therefore, there is an urgent need to detect APT malware and trace their associated organizations. This paper proposes an improved Transformer-based method for APT malware detection and attribution. In terms of detection, dynamic behaviors of APT malware are extracted, and an information filtering gate mechanism is applied to reduce redundant feature noise in the original Transformer model. A contrastive learningconstrained model is used for information filtering, self-training, and optimization. In terms of attribution, static features of APT malware samples are extracted, global features of sequence data are established using the Transformer model, local features are constructed using Incremental Dilated Convolutional Neural Network, and features are fused using attention mechanism. This method outperforms the baseline methods.