A Review on Methods and Applications in Multimodal Deep Learning

Summaira, Jabeen; Li, Xi; Shoib, Amin Muhammad; Jabbar, Abdul

doi:10.48550/arxiv.2202.09195

Cited by 2 publications

(4 citation statements)

References 90 publications

(144 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The survey focuses on audio, video, and text modalities. Another article [8] reviewed applied methods and applications in multimodal deep learning, where the authors concentrated on a few common deep learning (DL) methods and applications. Apart from this article, we discovered a few surveys that discussed how MML could solve different problems related to modalities.…”

Section: Related Studiesmentioning

confidence: 99%

“…Although article [1] focused on three modalities, this paper contained all the possible modalities. Article [8] discussed the current use of ML methods and applications in MML, but they limited their review by selecting typical ML methods and applications. Contrary this study presented all ML algorithms, domains, and applications there were available in the search range.…”

Section: Related Studiesmentioning

confidence: 99%

See 1 more Smart Citation

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

2023

View full text Add to dashboard Cite

Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and machine learning (ML) are combined to solve critical problems. Usually, research works use data from a single modality, such as images, audio, text, and signals. However, real-world issues have become critical now, and handling them using multiple modalities of data instead of a single modality can significantly impact finding solutions. ML algorithms play an essential role by tuning parameters in developing MML models. This paper reviews recent advancements in the challenges of MML, namely: representation, translation, alignment, fusion and co-learning, and presents the gaps and challenges. A systematic literature review (SLR) applied to define the progress and trends on those challenges in the MML domain. In total, 1032 articles were examined in this review to extract features like source, domain, application, modality, etc. This research article will help researchers understand the constant state of MML and navigate the selection of future research directions.

show abstract

Section: Related Studiesmentioning

confidence: 99%

Section: Related Studiesmentioning

confidence: 99%

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

2023

View full text Add to dashboard Cite

show abstract

Section: Multi-source Fault Diagnosismentioning

confidence: 99%

“…Multi-Modal Translation, defined as the task to transfer or translate knowledge from a source modality to a target one [22], enables one to learn a mapping from a source modality to a target one. Multi-Modal Translation includes variety of applications, such as Image Captioning [8] (generation of a textual representation from an image) and Multi-Modal Speech synthesis [22] (generating audio given its textual representation). It is worth mentioning that Multi-Modal translation where the target modality is high-dimensional can get extremely challenging; one way to respond to this challenge is translating to a low-dimensional representation of the target modality containing higher level of semantic information in comparison with the input belonging to the source modality [27].…”

Section: Multi-source Fault Diagnosismentioning

confidence: 99%

curr2vib: Modality Embedding Translation for Broken-Rotor Bar Detection

Berenji

Taghiyarrenani

Nowaczyk

2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

Recently and due to the advances in sensor technology and Internet-of-Things, the operation of machinery can be monitored, using a higher number of sources and modalities. In this study, we demonstrate that Multi-Modal Translation is capable of transferring knowledge from a modality with higher level of applicability (more usefulness to solve an specific task) but lower level of accessibility (how easy and affordable it is to collect information from this modality) to another one with higher level of accessibility but lower level of applicability. Unlike the fusion of multiple modalities which requires all of the modalities to be available during the deployment stage, our proposed method depends only on the more accessible one; which results in the reduction of the costs regarding instrumentation equipment. The presented case study demonstrates that by the employment of the proposed method we are capable of replacing five acceleration sensors with three current sensors, while the classification accuracy is also increased by more than 1%.

show abstract

A Review on Methods and Applications in Multimodal Deep Learning

Cited by 2 publications

References 90 publications

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions

curr2vib: Modality Embedding Translation for Broken-Rotor Bar Detection

Contact Info

Product

Resources

About