The rapid serial visual presentation-based brain–computer interface (RSVP-BCI) system achieves the recognition of target images by extracting event-related potential (ERP) features from electroencephalogram (EEG) signals and then building target classification models. Currently, how to reduce the training and calibration time for classification models across different subjects is a crucial issue in the practical application of RSVP. To address this issue, a zero-calibration (ZC) method termed Attention-ProNet, which involves meta-learning with a prototype network integrating multiple attention mechanisms, was proposed in this study. In particular, multiscale attention mechanisms were used for efficient EEG feature extraction. Furthermore, a hybrid attention mechanism was introduced to enhance model generalization, and attempts were made to incorporate suitable data augmentation and channel selection methods to develop an innovative and high-performance ZC RSVP-BCI decoding model algorithm. The experimental results demonstrated that our method achieved a balance accuracy (BA) of 86.33% in the decoding task for new subjects. Moreover, appropriate channel selection and data augmentation methods further enhanced the performance of the network by affording an additional 2.3% increase in BA. The model generated by the meta-learning prototype network Attention-ProNet, which incorporates multiple attention mechanisms, allows for the efficient and accurate decoding of new subjects without the need for recalibration or retraining.