Brain computer interfaces allow users to preform various tasks using only the electrical activity of the brain. BCI applications often present the user a set of stimuli and record the corresponding electrical response. The BCI algorithm will then have to decode the acquired brain response and perform the desired task. In rapid serial visual presentation (RSVP) tasks, the subject is presented with a continuous stream of images containing rare target images among standard images, while the algorithm has to detect brain activity associated with target images. In this work, we suggest a multimodal neural network for RSVP tasks. The network operates on the brain response and on the initiating stimulus simultaneously, providing more information for the BCI application. We present two variants of the multimodal network, a supervised model, for the case when the targets are known in advanced, and a semi-supervised model for when the targets are unknown. We test the neural networks with a RSVP experiment on satellite imagery carried out with two subjects. The multimodal networks achieve a significant performance improvement in classification metrics. We visualize what the networks has learned and discuss the advantages of using neural network models for BCI applications.