2022
DOI: 10.48550/arxiv.2205.10643
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Supervised Speech Representation Learning: A Review

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 191 publications
(272 reference statements)
0
14
0
Order By: Relevance
“…Recently, large deep artificial neural network models pre-trained on a massive amount ofunlabelled waveform features (e.g. [2, 10, 25]), have demonstrated strong generalisation abilities to ASR and many para-linguistic speech tasks [41]. It would be useful to apply our methods used in this paper to study similar types of models and tasks.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, large deep artificial neural network models pre-trained on a massive amount ofunlabelled waveform features (e.g. [2, 10, 25]), have demonstrated strong generalisation abilities to ASR and many para-linguistic speech tasks [41]. It would be useful to apply our methods used in this paper to study similar types of models and tasks.…”
Section: Discussionmentioning
confidence: 99%
“…SSL makes use of the data's underlying structure. In SSL classification systems, the model is first pre-trained on some pre-auxiliary task to capture rich embeddings from the innate structure of the data [4,8,16,23]. These embeddings are then used for other downstream classification tasks.…”
Section: Self-supervised Frameworkmentioning
confidence: 99%
“…This paradigm is contrasted with the use case of incremental updates to a pre-trained ASR model presented in this work. A comprehensive survey of such methods for speech representation learning are in [45]. The upstream model is trained with a pretext task such as a generative approach to predict or reconstruct the input given a limited view (eg past data, masking) such as autoregressive predictive coding [12].…”
Section: Related Workmentioning
confidence: 99%