Abstract:With the prevalence of RGB-D cameras, multimodal video data have become more available for human action recognition. One main challenge for this task lies in how to effectively leverage their complementary information. In this work, we propose a Modality Compensation Network (MCN) to explore the relationships of different modalities, and boost the representations for human action recognition. We regard RGB/optical flow videos as source modalities, skeletons as auxiliary modality. Our goal is to extract more di… Show more
“…Specifically, the teacher network was trained with RGB videos, providing supervision information for the student network handling skeleton data. Song et al [397] proposed a Modality Compensation Network (MCN) taking advantage of the skeleton modality to compensate the feature learning of the RGB modality with adaptive representation learning, and a modality adaptation block with residual feature learning was designed to bridge information between modalities.…”
Section: Co-learning With Visual Modalitiesmentioning
“…Specifically, the teacher network was trained with RGB videos, providing supervision information for the student network handling skeleton data. Song et al [397] proposed a Modality Compensation Network (MCN) taking advantage of the skeleton modality to compensate the feature learning of the RGB modality with adaptive representation learning, and a modality adaptation block with residual feature learning was designed to bridge information between modalities.…”
Section: Co-learning With Visual Modalitiesmentioning
“…Due to the heterogeneous, cluttered and dynamic background network gets confused with the action classes. Several attention and skeleton modality based approaches have been employed to generate discriminative features [15] [14]. Recently some works focused on extracting highly discriminative features using Extreme Learning Machine (ELM) [66].…”
Section: B Discriminative Feature Learningmentioning
confidence: 99%
“…Because of the noisy motion and appearance of unimportant infor-mation, close-fitted inter-class discrimination and extensive intra-class discrimination make this task quite challenging [8] [9]. Decreasing the feature discrimination between the intra-class and increasing the discrimination between interclass features can be one of the effective solutions for the concerned issue [14].…”
Section: Introductionmentioning
confidence: 99%
“…Numerous, action recognition techniques based on depth information along with RGB frames using RGB-D dataset for 3D action recognition [65] [71], attention mechanism [33] - [40] and skeleton modality [14] [70] [72] have been introduced to deal with the addressed problem. These published methods demand to pre-process the data, which increases the latency and time complexity during prediction.…”
Section: Introductionmentioning
confidence: 99%
“…These published methods demand to pre-process the data, which increases the latency and time complexity during prediction. Inspired by the work in [14], rather than pre-processing the data, we focus on increasing and then harmonizing the feature dis-FIGURE 1 Overall procedure: A visual representation of the end-to-end process. Left part shows an example of a batch input with conceptual histogram representation.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.