2023
DOI: 10.1016/j.patcog.2022.109233
|View full text |Cite
|
Sign up to set email alerts
|

Multi-scale local-temporal similarity fusion for continuous sign language recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(5 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…Since there's no unequivocal alignment between sign videotape frames and corresponding facades, it's essential to capture the ne-granulated buff-position details. ( 17).…”
Section: Related Work (Literature Review)mentioning
confidence: 99%
“…Since there's no unequivocal alignment between sign videotape frames and corresponding facades, it's essential to capture the ne-granulated buff-position details. ( 17).…”
Section: Related Work (Literature Review)mentioning
confidence: 99%
“…While the prior SL literature focuses more on techniques such as Hidden Markov Models (HMMs) for sequence modeling after extracting handcrafted features, recent studies follow the idea of employing 2D-3D CNN and RNN-based architectures in which frames or skeleton joint information are directly used (Aran, 2008 ; Camgöz et al, 2016a ; Koller et al, 2016 , 2019 ; Zhang et al, 2016 ; Mittal et al, 2019 ; Abdullahi and Chamnongthai, 2022 ; Samaan et al, 2022 ). More recently, Transformer based architectures have become popular on SLR and Sign Language Translation (SLT) tasks due to their success in domains such as Natural Language Processing (NLP) and Speech Processing (SP) (Vaswani et al, 2017 ; Camgoz et al, 2020b ; Rastgoo et al, 2020 ; Boháček and Hrúz, 2022 ; Cao et al, 2022 ; Chen et al, 2022 ; Hrúz et al, 2022 ; Hu et al, 2022 ; Xie et al, 2023 ).…”
Section: Related Workmentioning
confidence: 99%
“…CNN is a neural network meant to process input stored in arrays such as an image, which is ideally a two-dimensional (2D) array of pixels. CNNs are typically used with spatial or temporal ordering and consists of three layers: convolution layers, pooling layers, and classification layer [165], [170], [171], [172]. The authors in [173] proposed a framework for DL techniques in cyber-security, and analyzed CNN, RNN, and DNN.…”
Section: Deep Learning (Dl) Techniquesmentioning
confidence: 99%