2017
DOI: 10.1371/journal.pone.0179403
|View full text |Cite
|
Sign up to set email alerts
|

Bird sound spectrogram decomposition through Non-Negative Matrix Factorization for the acoustic classification of bird species

Abstract: Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorpor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(7 citation statements)
references
References 32 publications
0
7
0
Order By: Relevance
“…Processing bioacoustic data has typically relied on the initial extraction of features from the audio signal, which are often processed further into handcrafted features prior to their use in machine learning. Researchers in the field are aware of the limitations of this approach, as the most widely used handcrafted features are designed for human speech tasks and focus on spectral characteristics of the sound that may differ from the intended bioacoustic tasks 18 , 19 . However, traditional deep learning techniques applied directly to the raw waveform have remained largely out of reach, due to limitations primarily associated with insufficient volumes of labelled data.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Processing bioacoustic data has typically relied on the initial extraction of features from the audio signal, which are often processed further into handcrafted features prior to their use in machine learning. Researchers in the field are aware of the limitations of this approach, as the most widely used handcrafted features are designed for human speech tasks and focus on spectral characteristics of the sound that may differ from the intended bioacoustic tasks 18 , 19 . However, traditional deep learning techniques applied directly to the raw waveform have remained largely out of reach, due to limitations primarily associated with insufficient volumes of labelled data.…”
Section: Discussionmentioning
confidence: 99%
“…Labelled sounds in a training dataset allow the algorithm to learn and make predictions for new sounds 17 . Researchers have used a variety of machine learning algorithms such as Hidden Markov Models, Gaussian Mixture Models and Support Vector Machines 18 . More recently the focus has shifted toward the use of deep learning methods such as Convolutional Neural Networks (CNN) 7 .…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The non-negative matrix factorization (NMF) [31], the non-negative tensor factorization (NTF) [32], and the sparse coding [33] are instances of unsupervised learning methods for MGR. In addition to the sparse coding, there are many feature coding methods used in image classification, including the hard coding [34], the soft coding [35], the low-rank sparse coding [36] [37], the vector of locally aggregated descriptor (VLAD) coding [18], and the Fisher vector (FV) coding [38].…”
Section: B Classification Methodsmentioning
confidence: 99%