Deep Forest-Based Monocular Visual Sign Language Recognition

Xue, Qing; Li, Xuanpeng; Wang, Dong; Zhang, Weigong

doi:10.3390/app9091945

Cited by 11 publications

(3 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing word-level sign recognition models are mainly trained and evaluated on either private [26,38,78,28,49] or small-scale datasets with less than one hundred words [26,38,78,28,49,42,47,71]. These sign recognition approaches mainly consists of three steps: the feature ex-traction, temporal-dependency modeling and classification.…”

Section: Sign Language Recognition Approachesmentioning

confidence: 99%

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Opazo

et al. 2020

2020 IEEE Winter Conference on Applications of Computer Vision (WACV)

363

239

View full text Add to dashboard Cite

Vision-based sign language recognition aims at helping the hearing-impaired people to communicate with others. However, most existing sign language datasets are limited to a small number of words. Due to the limited vocabulary size, models learned from those datasets cannot be applied in practice. In this paper, we introduce a new largescale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers. This dataset will be made publicly available to the research community. To our knowledge,it is by far the largest public ASL dataset to facilitate word-level sign recognition research.Based on this new large-scale dataset, we are able to experiment several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios. Specifically we implement and compare two different models,i.e., (i) holistic visual appearance based approach, and (ii) 2D human pose based approach. Both models are valuable baselines that will benefit the community for method benchmarking. Moreover, we also propose a novel pose-based temporal graph convolution networks (Pose-TGCN) that models spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose-based method. Our results show that pose-based and appearance-based models achieve comparable performances up to 62.63% at top-10 accuracy on 2,000 words/glosses, demonstrating the validity and challenges of our dataset. We will make the large-scale dataset, as well as our baseline deep models, freely available on github.

show abstract

Section: Sign Language Recognition Approachesmentioning

confidence: 99%

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Opazo

et al. 2020

2020 IEEE Winter Conference on Applications of Computer Vision (WACV)

363

239

View full text Add to dashboard Cite

show abstract

“…Neural network models include CNN (convolutional neural network), RNN (recurrent neural network), GNN (graph neural network), etc. These have all been used in sign language research [ 29 ].…”

Section: Related Workmentioning

confidence: 99%

A Sign Language Recognition System Applied to Deaf-Mute Medical Consultation

Xia

Lü

Fan

et al. 2022

Sensors

View full text Add to dashboard Cite

It is an objective reality that deaf-mute people have difficulty seeking medical treatment. Due to the lack of sign language interpreters, most hospitals in China currently do not have the ability to interpret sign language. Normal medical treatment is a luxury for deaf people. In this paper, we propose a sign language recognition system: Heart-Speaker. Heart-Speaker is applied to a deaf-mute consultation scenario. The system provides a low-cost solution for the difficult problem of treating deaf-mute patients. The doctor only needs to point the Heart-Speaker at the deaf patient and the system automatically captures the sign language movements and translates the sign language semantics. When a doctor issues a diagnosis or asks a patient a question, the system displays the corresponding sign language video and subtitles to meet the needs of two-way communication between doctors and patients. The system uses the MobileNet-YOLOv3 model to recognize sign language. It meets the needs of running on embedded terminals and provides favorable recognition accuracy. We performed experiments to verify the accuracy of the measurements. The experimental results show that the accuracy rate of Heart-Speaker in recognizing sign language can reach 90.77%.

show abstract

“…In some cases, top-1, top-5, and top-10 accuracy were calculated, expressing the model's ability to identify 'most likely' candidates rather than one correct answer. A BLUE score was used to assess the quantitative output of translation models with values between 0 and 100 as depicted in Table 15, while qualitative analysis was based on comparison with ground RGB video [185] Kinect [189] Video [157] RGB image extracted from video [191] Video, Kinect [190] Video [187] RGB video, depth video, 3D skeletal data, facial features [41] RGB video, Kinect, 3D skeletal data [195] Kinect, RGB image, skeletal data [50] RGB video [49] RGB, Kinect, Skeleton point data [128] Infrared [133] RGB [66] RGB [3] RGB [37] RGB Video [49] RGB, depth, skeleton [193] Video [68] NA [130] RGB, Kinect [69] RGB Video [70] RGB Video [47] RGB Video [67] RGB, Kinect [158] RGB from two angles, Video RGB video [185] Kinect [189] Video [157] RGB image extracted from video [191] Video, Kinect [190] Video [187] RGB video, depth video, 3D skeletal data, facial features [41] RGB video, Kinect, 3D skeletal data [195] Kinect, RGB image, skeletal data [50] RGB video [49] RGB, Kinect, Skeleton point data [128] Infrared [133] RGB [66] RGB [3] RGB …”

Section: B Performance Evaluationmentioning

confidence: 99%

Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues

Al‐Qurishi¹,

Khalid²,

Souissi³

2021

IEEE Access

View full text Add to dashboard Cite

People with hearing impairments are found worldwide; therefore, the development of effective local level sign language recognition (SLR) tools is essential. We conducted a comprehensive review of automated sign language recognition based on machine/deep learning methods and techniques published between 2014 and 2021 and concluded that the current methods require conceptual classification to interpret all available data correctly. Thus, we turned our attention to elements that are common to almost all sign language recognition methodologies. This paper discusses their relative strengths and weaknesses, and we propose a general framework for researchers. This study also indicates that input modalities bear great significance in this field; it appears that recognition based on a combination of data sources, including vision-based and sensor-based channels, is superior to a unimodal analysis. In addition, recent advances have allowed researchers to move from simple recognition of sign language characters and words towards the capacity to translate continuous sign language communication with minimal delay. Many of the presented models are relatively effective for a range of tasks, but none currently possess the necessary generalization potential for commercial deployment. However, the pace of research is encouraging, and further progress is expected if specific difficulties are resolved.

show abstract

Deep Forest-Based Monocular Visual Sign Language Recognition

Cited by 11 publications

References 35 publications

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

A Sign Language Recognition System Applied to Deaf-Mute Medical Consultation

Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues

Contact Info

Product

Resources

About