Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-2028
|View full text |Cite
|
Sign up to set email alerts
|

Variational Information Bottleneck for Effective Low-Resource Audio Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Ever since the adoption of deep neural networks (DNNs), there have been many developments in ASC systems that treat the above problems with end-to-end learning methods. In particular, convolutional neural networks (CNNs) [10][11][12] significantly affect feature learning from pre-processed raw features, such as the log-mel spectrogram, mel-frequency cepstral coefficients (MFCC), constant-Q transform (CQT), gammatone frequency cepstral coefficients (GFCC), and chromagram [13][14][15]. The common strategy for designing an ASC system is to extract high-level feature maps that consist of activation scores of events from the raw audio data and classify them with global average pooling (GAP) or fully connected layers (FC layers) to each scene through supervised learning methods.…”
Section: Introductionmentioning
confidence: 99%
“…Ever since the adoption of deep neural networks (DNNs), there have been many developments in ASC systems that treat the above problems with end-to-end learning methods. In particular, convolutional neural networks (CNNs) [10][11][12] significantly affect feature learning from pre-processed raw features, such as the log-mel spectrogram, mel-frequency cepstral coefficients (MFCC), constant-Q transform (CQT), gammatone frequency cepstral coefficients (GFCC), and chromagram [13][14][15]. The common strategy for designing an ASC system is to extract high-level feature maps that consist of activation scores of events from the raw audio data and classify them with global average pooling (GAP) or fully connected layers (FC layers) to each scene through supervised learning methods.…”
Section: Introductionmentioning
confidence: 99%
“…To alleviate overfitting and enhance the generalization ability of deep learning models, many methods have been proposed, such as data augmentation [6], dropout [7], batch normalization [8], weight decay [9], pretraining [10], variational information bottleneck [11], etc. The most recently developed method is R-Drop [12], which forces the output distributions of two different sub-networks generated from dropout by utilizing Kullback-Leibler (KL) divergence [13] in the training stage to be consistent.…”
Section: Introductionmentioning
confidence: 99%