Fusing complementary information from different modalities can lead to the discovery of more accurate diagnostic biomarkers for psychiatric disorders. However, biomarker discovery through data fusion is challenging since it requires extracting interpretable and reproducible patterns from data sets, consisting of shared/unshared patterns and of different orders. For example, multi-channel electroencephalography (EEG) signals from multiple subjects can be represented as a third-order tensor with modes: subject, time, and channel, while functional magnetic resonance imaging (fMRI) data may be in the form of subject by voxel matrices. Traditional data fusion methods rearrange higher-order tensors, such as EEG, as matrices to use matrix factorization-based approaches. In contrast, fusion methods based on coupled matrix and tensor factorizations (CMTF) exploit the potential multi-way structure of higher-order tensors. The CMTF approach has been shown to capture underlying patterns more accurately without imposing strong constraints on the latent neural patterns, i.e., biomarkers. In this paper, EEG, fMRI and structural MRI (sMRI) data collected during an auditory oddball task (AOD) from a group of subjects consisting of patients with schizophrenia and healthy controls, are arranged as matrices and higher-order tensors coupled along the subject mode, and jointly analyzed using structure-revealing CMTF methods (also known as advanced CMTF (ACMTF)) focusing on unique identification of underlying patterns in the presence of shared/unshared patterns. We demonstrate that joint analysis of the EEG tensor and fMRI matrix using ACMTF reveals significant and biologically meaningful components in terms of differentiating between patients with schizophrenia and healthy controls while also providing spatial patterns with high resolution and improving the clustering performance compared to the analysis of only the EEG tensor. We also show that these patterns are reproducible, and study reproducibility for different model parameters. In comparison to the joint independent component analysis (jICA) data fusion approach, ACMTF provides easier interpretation of EEG data by revealing a single summary map of the topography for each component. Furthermore, fusion of sMRI data with EEG and fMRI through an ACMTF model provides structural patterns; however, we also show that when fusing data sets from multiple modalities, hence of very different nature, preprocessing plays a crucial role.