2020
DOI: 10.3390/s20133741
|View full text |Cite
|
Sign up to set email alerts
|

Open Set Audio Classification Using Autoencoders Trained on Few Data

Abstract: Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(3 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…The problem of open-set recognition coupled with few-shot learning is also faced in Ref. [42]. Two different autoencoder architectures with a multi-layer perceptron classifier are designed to identify target sound classes and reject unwanted ones.…”
Section: Related Workmentioning
confidence: 99%
“…The problem of open-set recognition coupled with few-shot learning is also faced in Ref. [42]. Two different autoencoder architectures with a multi-layer perceptron classifier are designed to identify target sound classes and reject unwanted ones.…”
Section: Related Workmentioning
confidence: 99%
“…Auto-Encoders [13, 14] learn a compact latent representation of an input sample and are trained using a reconstruction loss. Using regularizations and constraints, the latent space can be adjusted for various tasks, such as classification [15]. However, the learned features remain largely non-interpretable and abstract.…”
Section: Related Workmentioning
confidence: 99%
“…Although audio signals are natively one-dimensional sequences, most state-of-the-art approaches to audio classification based on CNNs use a two-dimensional (2D) input [12,13]. Usually, these 2D inputs computed from the audio signal are well-known time-frequency representations such as Mel-spectrograms [14,15,16,17] or the output of constant-Q transform [18] (CQT) filterbanks, among others. Time-frequency 2D audio representations are able to accurately extract acoustically meaningful patterns but require a set of parameters to be specified, such as the window type and length, hop size or the number of frequency bins.…”
Section: Introductionmentioning
confidence: 99%