Multi-scale local-temporal similarity fusion for continuous sign language recognition

Xie, Pan; Zhi, Cui; Du, Yanming; Zhao, Mengyi; Cui, Jianwei; Wang, Bin; Hu, Xiaohui

doi:10.1016/j.patcog.2022.109233

Cited by 24 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since there's no unequivocal alignment between sign videotape frames and corresponding facades, it's essential to capture the ne-granulated buff-position details. ( 17).…”

Section: Related Work (Literature Review)mentioning

confidence: 99%

Real-Time Sign Language Recognition

Varshney,

Kumar,

Thakur

2024

Medical Robotics and AI-Assisted Diagnostics for a High-Tech Healthcare Industry

View full text Add to dashboard Cite

Speaking with someone who has hearing loss may be quite challenging. Systems that can recognize different signs and alert regular people are thus required. Recognition of sign language is a big development in assisting deaf-mute persons. With the exception of J and Z, which require motion detection for recognition, the objective of this study is to create a model based on neural networks for precise and user-friendly sign language identification that can identify finger spelling-based hand gestures representing the ASL alphabets.

show abstract

“…Since there's no unequivocal alignment between sign videotape frames and corresponding facades, it's essential to capture the ne-granulated buff-position details. ( 17).…”

Section: Related Work (Literature Review)mentioning

confidence: 99%

Real-Time Sign Language Recognition

Varshney,

Kumar,

Thakur

2024

Medical Robotics and AI-Assisted Diagnostics for a High-Tech Healthcare Industry

View full text Add to dashboard Cite

show abstract

“…While the prior SL literature focuses more on techniques such as Hidden Markov Models (HMMs) for sequence modeling after extracting handcrafted features, recent studies follow the idea of employing 2D-3D CNN and RNN-based architectures in which frames or skeleton joint information are directly used (Aran, 2008 ; Camgöz et al, 2016a ; Koller et al, 2016 , 2019 ; Zhang et al, 2016 ; Mittal et al, 2019 ; Abdullahi and Chamnongthai, 2022 ; Samaan et al, 2022 ). More recently, Transformer based architectures have become popular on SLR and Sign Language Translation (SLT) tasks due to their success in domains such as Natural Language Processing (NLP) and Speech Processing (SP) (Vaswani et al, 2017 ; Camgoz et al, 2020b ; Rastgoo et al, 2020 ; Boháček and Hrúz, 2022 ; Cao et al, 2022 ; Chen et al, 2022 ; Hrúz et al, 2022 ; Hu et al, 2022 ; Xie et al, 2023 ).…”

Section: Related Workmentioning

confidence: 99%

Multi-cue temporal modeling for skeleton-based sign language recognition

2023

View full text Add to dashboard Cite

Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representations from the videos of the signs. Most SLR studies focus on manual features often extracted from the shape of the dominant hand or the entire frame. However, facial expressions combined with hand and body gestures may also play a significant role in discriminating the context represented in the sign videos. In this study, we propose an isolated SLR framework based on Spatial-Temporal Graph Convolutional Networks (ST-GCNs) and Multi-Cue Long Short-Term Memorys (MC-LSTMs) to exploit multi-articulatory (e.g., body, hands, and face) information for recognizing sign glosses. We train an ST-GCN model for learning representations from the upper body and hands. Meanwhile, spatial embeddings of hand shape and facial expression cues are extracted from Convolutional Neural Networks (CNNs) pre-trained on large-scale hand and facial expression datasets. Thus, the proposed framework coupling ST-GCNs with MC-LSTMs for multi-articulatory temporal modeling can provide insights into the contribution of each visual Sign Language (SL) cue to recognition performance. To evaluate the proposed framework, we conducted extensive analyzes on two Turkish SL benchmark datasets with different linguistic properties, BosphorusSign22k and AUTSL. While we obtained comparable recognition performance with the skeleton-based state-of-the-art, we observe that incorporating multiple visual SL cues improves the recognition performance, especially in certain sign classes where multi-cue information is vital. The code is available at: https://github.com/ogulcanozdemir/multicue-slr.

show abstract

“…CNN is a neural network meant to process input stored in arrays such as an image, which is ideally a two-dimensional (2D) array of pixels. CNNs are typically used with spatial or temporal ordering and consists of three layers: convolution layers, pooling layers, and classification layer [165], [170], [171], [172]. The authors in [173] proposed a framework for DL techniques in cyber-security, and analyzed CNN, RNN, and DNN.…”

Section: Deep Learning (Dl) Techniquesmentioning

confidence: 99%

Assessment of existing cyber-attack detection models for web-based systems

Awuor¹

2023

Global J. Eng. Technol. Adv.

View full text Add to dashboard Cite

In the current technological environment, different entities engage in intricate cyber security approaches in order to counter damages and disruptions in web-based systems. The design of the security protocols relies on the guarantee that attacks are prevented in the web-based systems. Prevention and detection using techniques such as access control tools, encryption and firewalls present limitations in the full protection of web-based systems. Furthermore, despite the sophistication of current systems, there are still shortfalls in high false positive and false negative threat detection rates, which is attributed to poor adaptation by systems and networks to the changing threats and behavior of cyber-criminals. In this perspective, this survey paper discusses the existing cyber-attack detection models, and recommends the cyber-attack detection models and techniques that are appropriate for web-based systems. It is evident that deep learning techniques offer better performance and robustness compared to traditional machine learning techniques and other non-artificial intelligence-based techniques. Deep learning techniques learn and extract features automatically without human intervention and can also handle big and multidimensional data more conventionally than the other techniques.

show abstract

Multi-scale local-temporal similarity fusion for continuous sign language recognition

Cited by 24 publications

References 9 publications

Real-Time Sign Language Recognition

Real-Time Sign Language Recognition

Multi-cue temporal modeling for skeleton-based sign language recognition

Assessment of existing cyber-attack detection models for web-based systems

Contact Info

Product

Resources

About