2018
DOI: 10.1250/ast.39.182
|View full text |Cite
|
Sign up to set email alerts
|

Introduction to acoustic event and scene analysis

Abstract: Acoustic event and scene analysis has seen extensive development because it is valuable in applications such as monitoring of elderly people and infants, surveillance, life-logging, and advanced multimedia retrieval. This article reviews the basics of acoustic event and scene analysis, including its term and problem definitions, available public datasets, challenges, and recent research trends.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 44 publications
(23 citation statements)
references
References 38 publications
(35 reference statements)
0
23
0
Order By: Relevance
“…This section provides thorough details regarding the parameterization of the proposed framework for classifying cat vocalizations, as well as the respective experimental results and how these compare with classification systems commonly used in the generalized sound classification literature [44], i.e., class-specific and universal HMMs, support vector machines, and echo state networks. The motivation behind these choices aimed at satisfying the condition of including both generative and discriminative pattern recognition schemes [45,46]. Moreover, during early experimentations, it was verified that MFCCs and temporal modulation features capture distinct properties of the structure of the available audio signals as they achieved different recognition rates characterized by diverse mislassifications, thus we decided to use them concurrently.…”
Section: Experimental Set-up and Resultsmentioning
confidence: 99%
“…This section provides thorough details regarding the parameterization of the proposed framework for classifying cat vocalizations, as well as the respective experimental results and how these compare with classification systems commonly used in the generalized sound classification literature [44], i.e., class-specific and universal HMMs, support vector machines, and echo state networks. The motivation behind these choices aimed at satisfying the condition of including both generative and discriminative pattern recognition schemes [45,46]. Moreover, during early experimentations, it was verified that MFCCs and temporal modulation features capture distinct properties of the structure of the available audio signals as they achieved different recognition rates characterized by diverse mislassifications, thus we decided to use them concurrently.…”
Section: Experimental Set-up and Resultsmentioning
confidence: 99%
“…Conventional CRNN-based approaches achieve reasonable event detection performances when there is a sufficient amount of training sound data. However, since recording and annotating environmental sounds are very timeconsuming [1], in many situations, the conventional CRNNbased methods are likely to exhibit degradation in their event detection performance. To address this limitation, we propose a new method of SED using graph Laplacian regularization based on sound event co-occurrence.…”
Section: Motivationmentioning
confidence: 99%
“…Sound event detection (SED) is a task that identifies types of sound and detects their onset and offset [1]. Recently, many works have addressed SED because SED has a large potential for many applications such as monitoring elderly people or infants [2], [3], automatic surveillance [4]- [6], automatic anomaly detection [7], [8], and media retrieval [9].…”
Section: Introductionmentioning
confidence: 99%
“…To understand the sounds that occur in a given environment, researchers have developed four major back‐end processing approaches to analyzing acoustic scenes and sound events: (i) ASC, (ii) audio tagging, (iii) SED and (iv) anomalous sound detection . In this section, we introduce each problem setting and describe typical approaches for conducting these tasks.…”
Section: Back‐end Techniques For Environmental Sound Processingmentioning
confidence: 99%