Electroencephalography (EEG) datasets are characterized by low signal-to-noise signals and unquantifiable noisy labels, which hinder the classification performance in rapid serial visual presentation (RSVP) tasks. Previous approaches primarily relied on supervised learning (SL), which may result in overfitting and reduced generalization performance. In this paper, we propose a novel multi-task collaborative network (MTCN) that integrates both SL and self-supervised learning (SSL) to extract more generalized EEG representations. The original SL task, i.e., the RSVP EEG classification task, is used to capture initial representations and establish classification thresholds for targets and non-targets. Two SSL tasks, including the masked temporal/spatial recognition task, are designed to enhance temporal dynamics extraction and capture the inherent spatial relationships among brain regions, respectively. The MTCN simultaneously learns from multiple tasks to derive a comprehensive representation that captures the essence of all tasks, thus mitigating the risk of overfitting and enhancing generalization performance. Moreover, to facilitate collaboration between SL and SSL, MTCN explicitly decomposes features into task-specific features and task-shared features, leveraging both label information with SL and feature information with SSL. Experiments conducted on THU, CAS, and GIST datasets illustrate the significant advantages of learning more generalized features in RSVP tasks. Our code is publicly accessible at https://github.com/Tammie-Li/MTCN.