2021
DOI: 10.48550/arxiv.2107.00594
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pretext Tasks selection for multitask self-supervised speech representation learning

Abstract: Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In various application domains, including computer vision, natural language processing and audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features has proven to be a particularly relevant pretext task leading to building usefu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 52 publications
0
5
0
Order By: Relevance
“…Such pretext tasks include, but are not limited to, applying and predicting parameters of the geometric transformations [ 26 ], jigsaw puzzle solving [ 27 ], inpainting [ 28 ] and colorization [ 29 ] of the images, and reversing augmentations. Typically, the pretext task methods have been coupled with other SSL techniques in recent years [ 30 , 31 , 32 ].…”
Section: Related Workmentioning
confidence: 99%
“…Such pretext tasks include, but are not limited to, applying and predicting parameters of the geometric transformations [ 26 ], jigsaw puzzle solving [ 27 ], inpainting [ 28 ] and colorization [ 29 ] of the images, and reversing augmentations. Typically, the pretext task methods have been coupled with other SSL techniques in recent years [ 30 , 31 , 32 ].…”
Section: Related Workmentioning
confidence: 99%
“…A well-defined pretext task should be defined in a way that enables the model to learn semantic features effectively from data. Nevertheless, it is generally challenging to define a task that can lead to meaningful a priori representations; it is difficult to establish if the surrogate task provides enough training signals to extract features that can be broadly usable by downstream tasks [158]. Therefore, many efforts have been made to design effective pretext tasks.…”
Section: Pretext Taskmentioning
confidence: 99%
“…We call it Multi-SSL. While there have been a few works in this area on sound representation [60,72,69], language [66] or visual representation learning [16,22], they have only considered addressing this in the standard multi-tasking framework [16,67]. This work, however, introduces Multi-SSL which investigates different design options to combine multiple SSL tasks and provide insights into the downstream tasks.…”
Section: Classification Vs Object Detection Tradeoffmentioning
confidence: 99%