2019
DOI: 10.1016/j.inffus.2019.02.010
|View full text |Cite
|
Sign up to set email alerts
|

EmbraceNet: A robust deep learning architecture for multimodal classification

Abstract: Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
97
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 121 publications
(98 citation statements)
references
References 48 publications
1
97
0
Order By: Relevance
“…The performance then improves quickly (e.g. from 94.4% to 96.0% for {motion, sound}) for smoothing window size [15,40] seconds, and then slowly (e.g. from 96.0% to 96.8%) for smoothing window size [40,80] seconds.…”
Section: Post-processing Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The performance then improves quickly (e.g. from 94.4% to 96.0% for {motion, sound}) for smoothing window size [15,40] seconds, and then slowly (e.g. from 96.0% to 96.8%) for smoothing window size [40,80] seconds.…”
Section: Post-processing Resultsmentioning
confidence: 99%
“…Many machine learning approaches have been proposed to fuse multimodal information for classification tasks [31], [33], [39], [40]. These approaches can be categorized as early integration (data-layer fusion), late integration (decision-layer fusion).…”
Section: Introductionmentioning
confidence: 99%
“…There are many recent examples of the use of autoencoders for such a purpose, i.e. in the field of robotics [27][28][29]. An advantage of multimodal autoencoders is that they can produce a vector of parameters based on the fusion of data originating from two or more different modalities.…”
Section: Methodsmentioning
confidence: 99%
“…Feature extraction using DCNN models has achieved promising results in extracting high-level features for different classification tasks [23]- [25]. Since fine-tuning of wellestablished DCNN architectures has not previously achieved good performance on this dataset, for this study, we employ the DCNN descriptor approach [26]- [28] to extract features in order to represent the discriminative characteristics of different classes sufficiently.…”
Section: A Network Architecturementioning
confidence: 99%