Morphological information from histopathology slides and molecular profiles from genomic data are important for predicting the staging of lung cancer. However, most deep learning-based prediction results are based on histopathology or genomics alone, which do not fully utilize the complementary information of both and ignore staining differences in histopathology images due to different staining conditions. In this study, we propose a Multimodal Multi-scale Attention Model (MMAM) to predict patients’ lung cancer staging outcomes by end-to-end multimodal fusion of patients' multi-scale histopathology image features with genetic features by augmented attention. The proposed MMAM consists of two phases. In the first phase, a Staining Difference Elimination Network (SDEN) is proposed for stable detection of histopathology image features to eliminate differences arising from different hospital/specimen staining conditions. In the second phase, we design a Multimodal Multi-scale Fusion Network (MMFN) to efficiently fuse different modality features. We have validated our approach on the lung cancer dataset from The Cancer Genome Atlas (TCGA). The results show that the proposed MMAM improves the staging effectiveness of the fusion network of histopathological and genomic features, reduces the effect of appearance variability due to staining differences on blurred pictures, achieves an AUC of 88.51%, and outperforms other popular methods in predicting lung cancer staging.