2017
DOI: 10.1109/taslp.2017.2690563
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

Abstract: Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classification task. For the acoustic modeling, a large set of contextual frames of the chunk are fed into the DNN to perfor… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
64
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(65 citation statements)
references
References 35 publications
0
64
0
1
Order By: Relevance
“…Several state-of-the-art results on various audio classification tasks have been obtained by using log-Mel spectrograms of raw audio, as features [25]. Convolutional Neural Networks have demonstrated an excellent performance gain in classification of these features [26,10] against other machine learning techniques. It has been shown that using attention layers with ConvNets further enhanced their performance [13].…”
Section: Motivationsmentioning
confidence: 99%
“…Several state-of-the-art results on various audio classification tasks have been obtained by using log-Mel spectrograms of raw audio, as features [25]. Convolutional Neural Networks have demonstrated an excellent performance gain in classification of these features [26,10] against other machine learning techniques. It has been shown that using attention layers with ConvNets further enhanced their performance [13].…”
Section: Motivationsmentioning
confidence: 99%
“…The performance of speech and image recognition systems has been significantly improved with the use of deep neural networks and exploding amount of training data. Audio-related tasks, e.g., Acoustic Scene Classification (ASC) [1,2,3], Sound Event Detection (SED) [4,5,6] and Audio Tagging [7,8,9,10], have also received increasing attention in recent years. They have many real-world applications.…”
Section: Introductionmentioning
confidence: 99%
“…When using a large enough dataset that provides satisfactory training data and has a a good representation for each different class, many methods have been successful in performing both of the intermediate tasks. A few methods for audio event detection can be found in [9] and [22], while for audio tagging in [12,24,25,1,19,6]. These tasks are less challenging to train for than Figure 1: Factorisation of the full transcription task.…”
Section: Task Factorisationmentioning
confidence: 99%
“…In [6], the authors proposed a content-based automatic music tagging algorithm using deep convolutional neural networks. In [24], the authors proposed to use a shrinking deep neural network incorporating unsupervised feature learning to handle the multi-label audio tagging. Furthermore, considering that only chunk level rather than frame-level labels are available, a large set of contextual frames of the chunk were fed into the network to perform this task.…”
Section: Introductionmentioning
confidence: 99%