2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020
DOI: 10.1109/iros45743.2020.9341699
|View full text |Cite
|
Sign up to set email alerts
|

Gimme Signals: Discriminative signal encoding for multimodal activity recognition

Abstract: We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities. Multivariate signal sequences are encoded in an image and are then classified using a recently proposed Effi-cientNet CNN architecture. Our focus was to find an approach that generalizes well across different sensor modalities without specific adaptions while still achieving good results. We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 42 publications
(21 citation statements)
references
References 53 publications
(110 reference statements)
0
21
0
Order By: Relevance
“…In order to show generalization of our action segmentation approach we integrate a recently proposed sparse graphbased representation for multi-modal action recognition [8]. In table 5 we compare two such graph representations with similar dense representations.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to show generalization of our action segmentation approach we integrate a recently proposed sparse graphbased representation for multi-modal action recognition [8]. In table 5 we compare two such graph representations with similar dense representations.…”
Section: Resultsmentioning
confidence: 99%
“…The latter is referred to as TSSI in the following sections. Other than these dense representations, we use graphs to convert a skeleton sequence into an image like Memmesheimer et al [8]. Here each coordinate of every joint is graphed in one combining image.…”
Section: Image Assemblymentioning
confidence: 99%
See 1 more Smart Citation
“…Imran and Raman [284] designed a three-stream network architecture, where a 1D-CNN for gyroscopic data, a 2D-CNN for RGB data, and an RNN for skeleton data were used, and late fusion was adopted to predict the final class label. Memmesheimer et al [389] transformed different modalities as images, and utilized CNN to perform HAR.…”
Section: Fusion Of Visual and Sensor Modalitiesmentioning
confidence: 99%
“…This allows a flexible per modality model-design, but comes at the computational cost of the multiple streams that need to be trained. For early fusion approaches, multiple modalities are fused on a representation level [32], reducing the training process to a single model but potentially loosing the more descriptive features from per-modality models. Kong et al [20] presented a multi modality distillation model.…”
Section: Skeleton-based Action Recognitionmentioning
confidence: 99%