“…This paradigm has broadly been adopted in many applications such as image segmentation Yang, Rao and Ma (2006), image compression Hong, Wright, Huang and Ma (2006), and object clustering Ho, Yang, Lim, Lee and Kriegman (2003). Uncovering the principles and laying out the fundamentals for multi-modal data has become an important topic in research in light of many applications in diverse fields including image fusion Hellwich and Wiedemann (2000), target recognition Korona and Kokar (1996), Ghanem, Panahi, Krim, Kerekes and Mattingly (2018), Ghanem, Roheda and Krim (2021), Wang, Skau, Krim and Cervone (2018), speaker recognition Soong and Rosenberg (1988), and handwriting analysis Xu, Krzyzak and Suen (1992). Convolutional neural networks have been widely used on multi-modal data as in Ngiam, Khosla, Kim, Nam, Lee and Ng (2011) Ramachandram and Taylor (2017), Valada, Oliveira, Brox and Burgard (2016), Roheda, Riggan, Krim and Dai (2018b), and Roheda, Krim, Luo and Wu (2018a).…”