2022
DOI: 10.1609/aaai.v36i1.19988
|View full text |Cite
|
Sign up to set email alerts
|

MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion

Abstract: Multimodal sensors (visual, non-visual, and wearable) can provide complementary information to develop robust perception systems for recognizing activities accurately. However, it is challenging to extract robust multimodal representations due to the heterogeneous characteristics of data from multimodal sensors and disparate human activities, especially in the presence of noisy and misaligned sensor data. In this work, we propose a cooperative multitask learning-based guided multimodal fusion approach, MuMu, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(3 citation statements)
references
References 46 publications
0
3
0
Order By: Relevance
“…Using our VCMA dataset, the highest average accuracy of 99.81% is achieved by the augmented key point data and sensor data. The highest accuracy of the UTD-MHAD dataset is 97.6% (Islam 2022), and the highest accuracy of the Berkeley MHAD dataset is 99.6% (Ahmad and Khan 2020) using depth and inertial sensor data. Although the modalities used are different, they are all multimodal action recognition with sensor data.…”
Section: Discussionmentioning
confidence: 99%
“…Using our VCMA dataset, the highest average accuracy of 99.81% is achieved by the augmented key point data and sensor data. The highest accuracy of the UTD-MHAD dataset is 97.6% (Islam 2022), and the highest accuracy of the Berkeley MHAD dataset is 99.6% (Ahmad and Khan 2020) using depth and inertial sensor data. Although the modalities used are different, they are all multimodal action recognition with sensor data.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, hybrid fusion techniques have also been explored, combining feature-level and decision-level fusion approaches [17][18][19]. These techniques aim to leverage the benefits of both strategies by fusing low-level sensory features and high-level decision outputs.…”
Section: Related Work 21 Multimodal Fusionmentioning
confidence: 99%
“…Although these modalities also bring their own limitations and challenges, especially for some real-world applications, they have a huge potential to contribute to activity recognition performance as well as to unlock the full capabilities of the skeleton and inertial modalities. Besides, future work could benefit from using more advanced and sophisticated encoder architectures that may enable additional cross-modal fusion strategies [94,95]. While different supervised and SSL methods have been adapted to HAR in research studies and this dissertation, in particular, there is a limited amount of studies that are focused on analyzing representations produced by these algorithms.…”
Section: Power Of Multimodality In Sslmentioning
confidence: 99%