Aurora Martinez Del Rio scite author profile

We address the problem of American Sign Language fingerspelling recognition "in the wild", using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions. We find that, as expected, letter error rates are much higher than in previous work on more controlled data, and we analyze the sources of error and effects of model variants.Index Terms-American Sign Language, fingerspelling, connectionist temporal classification, attention models 2 Two-handed fingerspelling occasionally occurs, including in our data.

show abstract

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Shi

Rio

Keane

et al. 2019

View full text Add to dashboard Cite

Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the data is recorded in a studio environment and the number of signers is limited. Our work aims to address the challenges of real-life data, reducing the need for detection or segmentation modules commonly used in this domain. We propose an end-to-end model based on an iterative attention mechanism, without explicit hand detection or segmentation. Our approach dynamically focuses on increasingly high-resolution regions of interest. It outperforms prior work by a large margin. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.

show abstract

Identifying the Correlations Between the Semantics and the Phonology of American Sign Language and British Sign Language: A Vector Space Approach

et al. 2022

View full text Add to dashboard Cite

Over the history of research on sign languages, much scholarship has highlighted the pervasive presence of signs whose forms relate to their meaning in a non-arbitrary way. The presence of these forms suggests that sign language vocabularies are shaped, at least in part, by a pressure toward maintaining a link between form and meaning in wordforms. We use a vector space approach to test the ways this pressure might shape sign language vocabularies, examining how non-arbitrary forms are distributed within the lexicons of two unrelated sign languages. Vector space models situate the representations of words in a multi-dimensional space where the distance between words indexes their relatedness in meaning. Using phonological information from the vocabularies of American Sign Language (ASL) and British Sign Language (BSL), we tested whether increased similarity between the semantic representations of signs corresponds to increased phonological similarity. The results of the computational analysis showed a significant positive relationship between phonological form and semantic meaning for both sign languages, which was strongest when the sign language lexicons were organized into clusters of semantically related signs. The analysis also revealed variation in the strength of patterns across the form-meaning relationships seen between phonological parameters within each sign language, as well as between the two languages. This shows that while the connection between form and meaning is not entirely language specific, there are cross-linguistic differences in how these mappings are realized for signs in each language, suggesting that arbitrariness as well as cognitive or cultural influences may play a role in how these patterns are realized. The results of this analysis not only contribute to our understanding of the distribution of non-arbitrariness in sign language lexicons, but also demonstrate a new way that computational modeling can be harnessed in lexicon-wide investigations of sign languages.

show abstract

Fingerspelling recognition in the wild with iterative visual attention

Shi¹,

Rio²,

Keane³

et al. 2019

Preprint

View full text Add to dashboard Cite

American Sign Language fingerspelling recognition in the wild

Shi

Rio

Keane

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.