2019
DOI: 10.3390/app9091945
|View full text |Cite
|
Sign up to set email alerts
|

Deep Forest-Based Monocular Visual Sign Language Recognition

Abstract: Sign language recognition (SLR) is a bridge linking the hearing impaired and the general public. Some SLR methods using wearable data gloves are not portable enough to provide daily sign language translation service, while visual SLR is more flexible to work with in most scenes. This paper introduces a monocular vision-based approach to SLR. Human skeleton action recognition is proposed to express semantic information, including the representation of signs’ gestures, using the regularization of body joint feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…Existing word-level sign recognition models are mainly trained and evaluated on either private [26,38,78,28,49] or small-scale datasets with less than one hundred words [26,38,78,28,49,42,47,71]. These sign recognition approaches mainly consists of three steps: the feature ex-traction, temporal-dependency modeling and classification.…”
Section: Sign Language Recognition Approachesmentioning
confidence: 99%
“…Existing word-level sign recognition models are mainly trained and evaluated on either private [26,38,78,28,49] or small-scale datasets with less than one hundred words [26,38,78,28,49,42,47,71]. These sign recognition approaches mainly consists of three steps: the feature ex-traction, temporal-dependency modeling and classification.…”
Section: Sign Language Recognition Approachesmentioning
confidence: 99%
“…Neural network models include CNN (convolutional neural network), RNN (recurrent neural network), GNN (graph neural network), etc. These have all been used in sign language research [ 29 ].…”
Section: Related Workmentioning
confidence: 99%
“…In some cases, top-1, top-5, and top-10 accuracy were calculated, expressing the model's ability to identify 'most likely' candidates rather than one correct answer. A BLUE score was used to assess the quantitative output of translation models with values between 0 and 100 as depicted in Table 15, while qualitative analysis was based on comparison with ground RGB video [185] Kinect [189] Video [157] RGB image extracted from video [191] Video, Kinect [190] Video [187] RGB video, depth video, 3D skeletal data, facial features [41] RGB video, Kinect, 3D skeletal data [195] Kinect, RGB image, skeletal data [50] RGB video [49] RGB, Kinect, Skeleton point data [128] Infrared [133] RGB [66] RGB [3] RGB [37] RGB Video [49] RGB, depth, skeleton [193] Video [68] NA [130] RGB, Kinect [69] RGB Video [70] RGB Video [47] RGB Video [67] RGB, Kinect [158] RGB from two angles, Video RGB video [185] Kinect [189] Video [157] RGB image extracted from video [191] Video, Kinect [190] Video [187] RGB video, depth video, 3D skeletal data, facial features [41] RGB video, Kinect, 3D skeletal data [195] Kinect, RGB image, skeletal data [50] RGB video [49] RGB, Kinect, Skeleton point data [128] Infrared [133] RGB [66] RGB [3] RGB …”
Section: B Performance Evaluationmentioning
confidence: 99%