2022
DOI: 10.48550/arxiv.2203.13628
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

Abstract: Inspired by the recent progress in self-supervised learning for computer vision, in this paper, through the DeLoRes learning framework, we introduce two new general-purpose audio representation learning approaches, the DeLoRes-S and DeLoRes-M. Our main objective is to make our network learn representations in a resource-constrained setting (both data and compute), that can generalize well across a diverse set of downstream tasks. Inspired from the Barlow Twins objective function, we propose to learn embeddings… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…We propose an augmentation layer for generating various views from raw waveforms and an encoding layer for learning meaningful representations. Although performance differences are observed depending on the downstream tasks, WaveBYOL generally shows competitive performance compared to the previously proposed models [12], [14], [15], [17], [18], [19].…”
Section: Introductionmentioning
confidence: 79%
See 2 more Smart Citations
“…We propose an augmentation layer for generating various views from raw waveforms and an encoding layer for learning meaningful representations. Although performance differences are observed depending on the downstream tasks, WaveBYOL generally shows competitive performance compared to the previously proposed models [12], [14], [15], [17], [18], [19].…”
Section: Introductionmentioning
confidence: 79%
“…In frozen-model evaluation, a linear classifier with a multilayer perceptron (MLP) layer is trained to classify a new dataset based on top of the frozen pretrained network, and in fine-tuning, we allow all weights to vary during training. In the frozen-model evaluation experiment, WaveBYOL is compared with COLA [12], DeLoRes [15], BYOL-A [17], [18], and ATST [19], and in the fine-tuning experiment, it is compared with COLA, DeLoRes, SSAST [14], and ATST.…”
Section: Model Training and Performance Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…BEST-RQ [9] masks the speech input and feeds the masks into an encoder to learn masked parts of speech based on the unmasked part through random-projection quantizers. DeLoRes [10] learns general purpose audio representations through an invariance and redundancy reduction based objective function. wav2vec 2.0 [3] enhances vq-wav2vec through a single-stage training by masking the input speech data into the latent space and then solves a contrastive task defined over a quantization of the latent representations by computing the similarity between the predicted masked vectors and original vectors.…”
Section: Introductionmentioning
confidence: 99%