2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) 2011
DOI: 10.1109/icsipa.2011.6144164
|View full text |Cite
|
Sign up to set email alerts
|

Max-pooling convolutional neural networks for vision-based hand gesture recognition

Abstract: Abstract-Automatic recognition of gestures using computer vision is important for many real-world applications such as sign language recognition and human-robot interaction (HRI). Our goal is a real-time hand gesture-based HRI interface for mobile robots. We use a state-of-the-art big and deep neural network (NN) combining convolution and max-pooling (MPCNN) for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves. The hand contour is retrieved b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
262
0
2

Year Published

2013
2013
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 554 publications
(264 citation statements)
references
References 35 publications
0
262
0
2
Order By: Relevance
“…The translation invariant property leads to the question: why to create a full connected neural network? There is no need to have full connections because we always work with finite images .The farther the connection, the less importance to the computation [18].…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…The translation invariant property leads to the question: why to create a full connected neural network? There is no need to have full connections because we always work with finite images .The farther the connection, the less importance to the computation [18].…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…These include three well-known ConvNets for computer vision applications with different computational loads, namely the Convolutional Face Finder (CFF) [12], LeNet-5 [13] and MPCNN [14]. We also implemented two ConvNets from existing FPGA works [4][7] for scene labelling and sign recognition which we denote CNP and Sign Recognition respectively, and one ConvNet for scene labelling from an embedded GPU work [15].…”
Section: A Benchmarksmentioning
confidence: 99%
“…The receptive field is the size of the kernel used. Each feature map represents distinct extracted features from the input, e.g., one may represent vertical edges and another may represent points [8].…”
Section: Kernels Convolution and Subsamplingmentioning
confidence: 99%