2022
DOI: 10.1109/jstsp.2022.3195430
|View full text |Cite
|
Sign up to set email alerts
|

Pretext Tasks Selection for Multitask Self-Supervised Audio Representation Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 66 publications
0
4
0
Order By: Relevance
“…More recently, the success of BERT in NLP has drawn attention from researchers in acoustic signal processing. Some pioneering works 7 – 10 , 21 , 22 have shown the effectiveness of adapting BERT and other self-supervised approaches to Automatic Speech Recognition (ASR). By designing pre-training objectives specific to the audio modality, it is possible to adapt BERT-like models to music and other audio domains.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, the success of BERT in NLP has drawn attention from researchers in acoustic signal processing. Some pioneering works 7 – 10 , 21 , 22 have shown the effectiveness of adapting BERT and other self-supervised approaches to Automatic Speech Recognition (ASR). By designing pre-training objectives specific to the audio modality, it is possible to adapt BERT-like models to music and other audio domains.…”
Section: Related Workmentioning
confidence: 99%
“…This work extends this approach in two manners, first applying it for domain adaptation in a supervised setting, and second extending it to the speech recognition task. We use for this the Hilbert-Schmidt Independence Criterion (HSIC) [25], a kernel-based dependence estimator, also validated on pretext task selection in previous works [26]. The lower the HSIC estimator, the more conditionally independent the two sets are and the better the augmentations should be.…”
Section: Motivation and Technical Descriptionmentioning
confidence: 99%
“…Self-supervised learning (SSL) enables the use of large amounts of unlabelled data to obtain substantial performance improvements in a variety of downstream tasks without relying on manual annotations. Various approaches have been introduced including predictive coding [1,2], multi-task learning [3,4], autoencoding techniques [5] or contrastive learning [6,7]. In this context, data augmentation has become an important part of many self-supervised approaches.…”
Section: Introductionmentioning
confidence: 99%
“…This is particularly beneficial as it reduces the need for expensive and imprecise manual annotation. Various approaches have been introduced in the literature, including predictive coding, [1], multi-task learning [2,3], auto-encoding techniques [4] or contrastive learning [5].…”
Section: Introductionmentioning
confidence: 99%