2022
DOI: 10.1109/tmm.2021.3109665
|View full text |Cite
|
Sign up to set email alerts
|

PiSLTRc: Position-Informed Sign Language Transformer With Content-Aware Convolution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(12 citation statements)
references
References 37 publications
0
12
0
Order By: Relevance
“…SL-Transf (Camgoz et al, 2020b) models utilize pre-trained features from CNN-LSTM-HMM and jointly learn sign language recognition and translation. PiSLTRc (Xie et al, 2021) uses positioninformed temporal convolution base on SL-Transf. SignBT (Zhou et al, 2021a) uses Sign Back-Translation for data augmentation.…”
Section: Comparison Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…SL-Transf (Camgoz et al, 2020b) models utilize pre-trained features from CNN-LSTM-HMM and jointly learn sign language recognition and translation. PiSLTRc (Xie et al, 2021) uses positioninformed temporal convolution base on SL-Transf. SignBT (Zhou et al, 2021a) uses Sign Back-Translation for data augmentation.…”
Section: Comparison Resultsmentioning
confidence: 99%
“…carefully design and test different data augmentations and combinations to learn a noiseinvariant sentence representation. mRASP2 model (Pan et al, 2021) leverages monolingual data and bilingual data under a unified training framework to close the representation gap of different languages by introducing contrastive learning and aligned augmentation. Gao et al (2021) apply the dropout (Srivastava et al, 2014) twice acting as minimal data augmentation to obtain two different embeddings as "positive pairs" for each sentence.…”
Section: Contrastive Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Finally, the proposed multi-level CTC loss part is used for training to obtain recognition results. The research in this paper is based on 1DCNN, using temporal receptive fields of different scales to enhance sequence modeling capability, followed by the self-attention mechanism of transfomers [21,22] to better access long-distance dependent information and improve the discriminative power of features. Finally using the proposed multilevel CTC loss, which not only can better decode the temporal features, but also can make the parameters of the shallow network well updated, and then efficiently train the frame-wise feature extraction network and temporal modeling network, further improving the recognition performance.…”
Section: Introductionmentioning
confidence: 99%