It is becoming increasingly clear that combining multi-modal brain imaging data is able to provide more information for individual subjects by exploiting the rich multimodal information that exists. However, the number of studies that do true multimodal fusion (i.e. capitalizing on joint information among modalities) is still remarkably small given the known benefits. In part, this is because multi-modal studies require broader expertise in collecting, analyzing, and interpreting the results than do unimodal studies. In this paper, we start by introducing the basic reasons why multimodal data fusion is important and what it can do, and importantly how it can help us avoid wrong conclusions and help compensate for imperfect brain imaging studies. We also discuss the challenges that need to be confronted for such approaches to be more widely applied by the community. We then provide a review of the diverse studies that have used multimodal data fusion (primarily focused on psychosis) as well as provide an introduction to some of the existing analytic approaches. Finally, we discuss some up-and-coming approaches to multi-modal fusion including deep learning and multimodal classification which show considerable promise. Our conclusion is that multimodal data fusion is rapidly growing, but it is still underutilized. The complexity of the human brain coupled with the incomplete measurement provided by existing imaging technology makes multimodal fusion essential in order to mitigate against misdirection and hopefully provide a key to finding the missing link(s) in complex mental illness.