Non-invasive brain-computer interfaces (BCIs) have been widely used for neural decoding, linking neural signals to control devices. Hybrid BCI systems using electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) have received significant attention for overcoming the limitations of EEG-and fNIRS-standalone BCI systems. However, most hybrid EEG-fNIRS BCI studies have focused on late fusion because of discrepancies in their temporal resolutions and recording locations. Despite the enhanced performance of hybrid BCIs, late fusion methods have difficulty in extracting correlated features in both EEG and fNIRS signals. Therefore, in this study, we proposed a deep learning-based early fusion structure, which combines two signals before the fully-connected layer, called the fNIRS-guided attention network (FGANet). First, 1D EEG and fNIRS signals were converted into 3D EEG and fNIRS tensors to spatially align EEG and fNIRS signals at the same time point. The proposed fNIRS-guided attention layer extracted a joint representation of EEG and fNIRS tensors based on neurovascular coupling, in which the spatially important regions were identified from fNIRS signals, and detailed neural patterns were extracted from EEG signals. Finally, the final prediction was obtained by weighting the sum of the prediction scores of the EEG and fNIRS-guided attention features to alleviate performance degradation owing to delayed fNIRS response. In the experimental results, the FGANet significantly outperformed the EEG-standalone network.Furthermore, the FGANet has 4.0% and 2.7% higher accuracy than the state-of-the-art algorithms in mental arithmetic and motor imagery tasks, respectively.