There is a growing interest in multidisciplinary research in multimodal synthesis technology to stimulate diversity of modal interpretation in different application contexts. The current literature review focuses on modality-based systems in a specific known context and leaves a gap in fusing multiple modality data types in different contexts. Therefore, there seems to be a real requirement for an analytical review of recent developments in the field of data fusion. The real requirement for modality diversity across multiple contextual representation fields is due to the conflicting nature of data in multi-target sensors, which introduces other obstacles including ambiguous, uncertain data, imbalance and redundancy in multi object classification. Additionally, there is lack of frameworks that can analyze offline stream data to identify hidden relationships between different modal data types and different modal counts. Additionally, the lack of a multimodal fusion model capable of determining the extraction conditions of the extracted fusion data has led to low accuracy rates in classifying objects across modalities and systems.
This paper proposes a new adaptive and late multimodal fusion framework to interpret multiple modalities and contextual representations using evidence-enhanced deep learning-based Dempster-Shafer theory. The proposed multimodal fusion framework is a MultiFusion learning model solution to solve the Modality-and context-based fusion to improve remote management, intelligent systems, and decision making. The proposed multimodal fusion framework can address the contradictory nature of data uncertainty, diversity of methods, factors, conditions, and relationships for multimodal explanation in multi-context systems to improve decision making and control in diverse contextual representations. Furthermore, this research provides a comparative analysis of the current fusion and prior multimodal data fusion models, explaining the differences of the construction analysis, mathematical analysis of fusion models, pros, and cons of them. In addition, this research presents a comparative analysis between the proposed framework with previous published fusion frameworks, exploring their concepts, advantages and problems, drivers, and current techniques. The experimental accuracy results in multimodalities experiments and multi-context using the proposed multimodal fusion framework is 98.45%. Additionally, some future research directions are discussed.