Tumor antigens are key targets in cancer immunotherapies that can be recognized by T cell receptor and induce immune responses. However, precision screening of immunogenic tumor antigens remains a great challenge due to human leukocyte antigen (HLA) restriction and tumor antigen escape. Here, we introduce MultiTAP (Multi-modal Tumor Antigen Predictor), a pioneering multi-modal framework with TCR-peptide-HLA sequence and structure features incorporating an attention mechanism designed to accurately identify tumor antigens with immunogenic properties. By constructing the multi-modal TCR-peptide-HLA Dataset (TPHD) and integrating its sequence and structure, we perform antigen feature enhancement using peptide-HLA (pHLA) structural features at the residue level, achieving interpretable prediction of immunogenicity for tumor antigens. Relative to existing baseline models, the MultiTAP framework has exhibited superior efficacy in predicting the immunogenicity of tumor antigens. Through comprehensive out-of-distribution (OOD) assessments, MultiTAP has maintained predictive robustness across diverse HLA phenotypes and the continuously evolving landscape of epitope distributions. Overall, MultiTAP presents a brand-new and promising approach for cancer immunotherapies that target tumor antigens.