Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1310
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Phonotactic Representations for Language Identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Ramesh et al (2021) [147] explored self-supervised approaches to build phonotactic based LID systems for seven Indian languages; Bengali, Hindi, Malayalam, Marathi, Punjabi, Tamil, and Telugu. The audios were collected from video streaming websites.…”
Section: Das Et Al (2020)mentioning
confidence: 99%
See 2 more Smart Citations
“…Ramesh et al (2021) [147] explored self-supervised approaches to build phonotactic based LID systems for seven Indian languages; Bengali, Hindi, Malayalam, Marathi, Punjabi, Tamil, and Telugu. The audios were collected from video streaming websites.…”
Section: Das Et Al (2020)mentioning
confidence: 99%
“…For the languages with near to no standard resource available, zero-shot learning [196] can be an efective solution. Self-supervised learning [8,147,149,174] is also a promising approach to develop speech technologies with severely low-resourced Indian languages where veriied ground truth labels are very challenging to collect.…”
Section: Issue Of Low-resourcementioning
confidence: 99%
See 1 more Smart Citation
“…Here, we use the wav2vec2-base-960h pretrained model from 2 without further self-supervised training or supervised fine-tuning, extracting embeddings as the low-level descriptors (LLDs) to form the basis for more elaborate feature representations. In [26], the authors showed that aggregating wav2vec 2.0 embeddings outperforms supervised counterparts, and they show aggregation is suitable for extracting phonotactic constraints. In [21], the authors showed the effectiveness of using different layers from the pretrained wav2vec model on emotion recognition tasks.…”
Section: Transformer-based Acoustic Featuresmentioning
confidence: 99%
“…The fine-tuned model can be then used as a feature extractor with the extracted representations subsequently being used as input to the downstream model [24,91].…”
Section: Introductionmentioning
confidence: 99%