2019
DOI: 10.1109/access.2019.2940997
|View full text |Cite
|
Sign up to set email alerts
|

One-Shot Learning Hand Gesture Recognition Based on Lightweight 3D Convolutional Neural Networks for Portable Applications on Mobile Systems

Abstract: Though deep convolutional neural networks (CNNs) have made great breakthroughs in the field of vision-based gesture recognition, however it is challenging to deploy these high-performance networks to resource-constrained mobile platforms and acquire large numbers of labeled samples for deep training of CNNs. Furthermore, there are some application scenarios with only a few samples or even a single one for a new gesture class so that the recognition method based on CNNs cannot achieve satisfactory classificatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 42 publications
0
4
0
Order By: Relevance
“…In the case of video analysis, the input is a sequence of frames, and the 3D convolutional layer applies a kernel to each frame and its neighboring frames in the temporal dimension to extract features that capture both spatial and temporal information. This allows the model to learn patterns and movements over time, which is crucial for tasks such as action recognition and gesture recognition (see Figure 3) [24]. After the 3D convolutional layers, pooling layers are often used to downsample the feature maps and reduce the spatial dimensionality of the data.…”
Section: D Convolutionsmentioning
confidence: 99%
“…In the case of video analysis, the input is a sequence of frames, and the 3D convolutional layer applies a kernel to each frame and its neighboring frames in the temporal dimension to extract features that capture both spatial and temporal information. This allows the model to learn patterns and movements over time, which is crucial for tasks such as action recognition and gesture recognition (see Figure 3) [24]. After the 3D convolutional layers, pooling layers are often used to downsample the feature maps and reduce the spatial dimensionality of the data.…”
Section: D Convolutionsmentioning
confidence: 99%
“…Xu et al proposed a mushroom recognition system using a CNN. The authors collected a dataset of 1,811 mushroom images belonging to 23 different species and achieved an accuracy of 94.47% [1]. Hadi et al proposed a CNN-based mushroom recognition system that achieved an accuracy of 90.4%.…”
Section: Literature Surveymentioning
confidence: 99%
“…Through the quantization and pruning of CNN, a certain amount of parameters are reduced. Furthermore, Lu et al [198] proposed a lightweight I3D-based network for gesture recognition. This model has spatio-temporal separable 3D convolution and fire modules, which can effectively extract discriminative features.…”
Section: ) Intelligent Transportation Systems (Its)mentioning
confidence: 99%