Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1705
|View full text |Cite
|
Sign up to set email alerts
|

Deep Convex Representations: Feature Representations for Bioacoustics Classification

Abstract: In this paper, a deep convex matrix factorization framework is proposed for bioacoustics classification. Archetypal analysis, a form of convex non-negative matrix factorization, is used for acoustic modelling at each level of this framework. At first level, the input feature matrix is factorized into an archetypal dictionary and corresponding convex representations. The representation matrix obtained at the first level is further factorized into a dictionary and convex representations at second level. This hie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Hsu et al [109] apply deep MF on the spectrogram matrix of a set of spoken sentences to extract several layers of frequential basis features, and is better able to separate the speakers in a mixture than a simple one-layer NMF. Thakur et al [110] used deep AA to extract sources based on the spectrograms of bioacoustics signals, with the dictionaries learnt at the first layers corresponding to archetypes on the convex hull of the data while deeper atoms being more in the center of the data. The classification accuracy obtained with a SVM based on the inner representations H L 's is higher than other state-of-the-art classification methods.…”
Section: Audio Processingmentioning
confidence: 99%

Deep matrix factorizations

De Handschutter,
Gillis,
Siebert
2020
Preprint
“…Hsu et al [109] apply deep MF on the spectrogram matrix of a set of spoken sentences to extract several layers of frequential basis features, and is better able to separate the speakers in a mixture than a simple one-layer NMF. Thakur et al [110] used deep AA to extract sources based on the spectrograms of bioacoustics signals, with the dictionaries learnt at the first layers corresponding to archetypes on the convex hull of the data while deeper atoms being more in the center of the data. The classification accuracy obtained with a SVM based on the inner representations H L 's is higher than other state-of-the-art classification methods.…”
Section: Audio Processingmentioning
confidence: 99%

Deep matrix factorizations

De Handschutter,
Gillis,
Siebert
2020
Preprint
“…The three mentioned shallow learning baselines include polynomial kernel based extreme learning machines (KELM) 30 and random forest classifiers, wherein (1) the KELM is trained on low-level audio descriptors while the later one is trained on (2) the unsupervised and (3) supervised feature representations. The unsupervised feature representations are obtained from spherical K-means 26 (SKM) while the supervised representations 28 are acquired by deep convex matrix factorization (DCR). The input feature representations (Melspectrogram and compressed spectral frames) used in the respective studies are also used here.…”
Section: B Comparative Studiesmentioning
confidence: 99%
“…Stowell and Plumbley 26 proposed spherical K-means based unsupervised feature learning for large scale bird species classification. Building on their work, Thakur et al 27,28 proposed to use archetypal analysis 29 and deep archetypal analysis for obtaining supervised convex representations for bioacoustic classification. Kernelbased extreme learning machines are used by Qian et el.…”
Section: Introductionmentioning
confidence: 99%
“…Despite weather noise and a wide variety of bird call types, machine learning approaches, particularly deep learning, can obtain very high recognition rates on remote monitored auditory data [6]. There have been numerous endeavours in the literature to classify birds, from pre-segmented acoustic single-label audio recordings [7,8,9,10,11]. Multi-label bird classification is difficult because of the time-frequency overlapping in the audio recordings.…”
Section: Introductionmentioning
confidence: 99%