Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-805
|View full text |Cite
|
Sign up to set email alerts
|

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition

Abstract: We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED, we introduce a convolutional neural network (CNN) with a large input field. In contrast to previous works, this enables to train audio event detection end-to-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
95
0
3

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 140 publications
(99 citation statements)
references
References 33 publications
1
95
0
3
Order By: Relevance
“…We augment the data using standard techniques [6]: random cut of the 8-second training sample, random amplification of each source from (0.75, 1.25), random selection of the left or the right channel and shuffling of the sources between different tracks in half of the training batch. We train using RAdam [7] with the Lookahead optimizer [8] for a max of 250 iterations.…”
Section: Methodsmentioning
confidence: 99%
“…We augment the data using standard techniques [6]: random cut of the 8-second training sample, random amplification of each source from (0.75, 1.25), random selection of the left or the right channel and shuffling of the sources between different tracks in half of the training batch. We train using RAdam [7] with the Lookahead optimizer [8] for a max of 250 iterations.…”
Section: Methodsmentioning
confidence: 99%
“…This framework maps audio bases, extracted by non-negative matrix factorization (NMF), to the detected visual objects. In recent year, audio event detection (AED) [8,29,36] has received attention in the research community. Most of the AED methods locate audio events and then classify each event.…”
Section: Related Workmentioning
confidence: 99%
“…Meanwhile, tagging the sound events consumes amounts of manpower. To address this problem, two kinds of approaches have been proposed: one efficiently making use of limited data [15][16][17][18][19] and the other based on additional data [3,20,21]. In the Figure 1: Architecture of the end-to-end audio classification network.…”
Section: Related Workmentioning
confidence: 99%
“…first type of method, Takahashi et al [15] mixed two different sounds belonging to one class to extend the training distribution. Mixup [17] interpolated the training data and trained the model to output the mixing ratio.…”
Section: Related Workmentioning
confidence: 99%