Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Kim, Tae‐Hwan; Shakhnarovich, Greg; Livescu, Karen

doi:10.1109/iccv.2013.192

Cited by 27 publications

(35 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Much previous work on sign language recognition, and the vast majority of previous work on fingerspelling recognition, uses some form of hand detection or segmentation to localize the region(s) of interest as an initial step. Kim et al [18,19,17] estimate a signerdependent skin color model using manually annotated hand regions for fingerspelling recognition. Huang et al [15] learn a hand detector based on Faster R-CNN [33] using manually annotated signing hand bounding boxes, and apply it to general sign language recognition.…”

Section: Related Workmentioning

confidence: 99%

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Shi

Rio

Keane

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the data is recorded in a studio environment and the number of signers is limited. Our work aims to address the challenges of real-life data, reducing the need for detection or segmentation modules commonly used in this domain. We propose an end-to-end model based on an iterative attention mechanism, without explicit hand detection or segmentation. Our approach dynamically focuses on increasingly high-resolution regions of interest. It outperforms prior work by a large margin. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.

show abstract

Section: Related Workmentioning

confidence: 99%

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Shi

Rio

Keane

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Hand gesture recognition provides a means to decode the information expressed by the reported categories which are always more used to interact with innovative applications, such as interactive games [3], [4], serious games [5], [6], sign language recognition [7]- [10], emotional expression identification [11], [12], remote control in robotics [13], [14] or alternative computer interfaces [15]- [18]. In general, the approaches used in hand gesture recognition can be divided into two main classes: 3D model-based [19] and appearancebased [20].…”

Section: Introductionmentioning

confidence: 99%

Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures

Avola

Bernardi

Cinque

et al. 2019

IEEE Trans. Multimedia

158

View full text Add to dashboard Cite

In human interactions, hands are a powerful way of expressing information that, in some cases, can be used as a valid substitute for voice, as it happens in Sign Language. Hand gesture recognition has always been an interesting topic in the areas of computer vision and multimedia. These gestures can be represented as sets of feature vectors that change over time. Recurrent Neural Networks (RNNs) are suited to analyse this type of sets thanks to their ability to model the long term contextual information of temporal sequences. In this paper, a RNN is trained by using as features the angles formed by the finger bones of human hands. The selected features, acquired by a Leap Motion Controller (LMC) sensor, have been chosen because the majority of human gestures produce joint movements that generate truly characteristic corners. A challenging subset composed by a large number of gestures defined by the American Sign Language (ASL) is used to test the proposed solution and the effectiveness of the selected angles. Moreover, the proposed method has been compared to other state of the art works on the SHREC dataset, thus demonstrating its superiority in hand gesture recognition accuracy.• the search of a robust solution able to recognize also gestures that are similar to each other; • the achievement of the highest accuracy level compared with works of the current literature.

show abstract

“…As in a number of other domains, convolutional neural networks (CNNs) have recently been replacing engineered features in sign language recognition research [17,18,19,11,8]. For sequence modeling, most previous work has used hidden Markov models (HMMs) [20,17,18,13], and some has used segmental conditional random fields [21,22,13]. Much of this work relies on frame-level labels for the training data.…”

Section: Related Workmentioning

confidence: 99%

“…Most fingerspelling recognition approaches begin by extracting the signing hand from the image frames [21,13,11]. Due to the high quality of video used in prior work, hand detection (or segmentation) is usually treated as a preprocessing step with high accuracy, with little analysis of its impact on performance.…”

Section: Related Workmentioning

confidence: 99%

American Sign Language Fingerspelling Recognition in the Wild

Shi

Rio

Keane

et al. 2018

2018 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

View full text Add to dashboard Cite

We address the problem of American Sign Language fingerspelling recognition "in the wild", using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions. We find that, as expected, letter error rates are much higher than in previous work on more controlled data, and we analyze the sources of error and effects of model variants.Index Terms-American Sign Language, fingerspelling, connectionist temporal classification, attention models 2 Two-handed fingerspelling occasionally occurs, including in our data.

show abstract

Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Cited by 27 publications

References 34 publications

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures

American Sign Language Fingerspelling Recognition in the Wild

Contact Info

Product

Resources

About