Adaptive Multi-Scale Detection of Acoustic Events

Ding, Wenhao; He, Liang

doi:10.1109/taslp.2019.2953350

Cited by 14 publications

(9 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This section details the architectural details and evaluation metrics for rare SED using CNN and visual object detectors. A simple CNN architecture was selected in this experiment that did not involve the use of any CNN variant such as guided learning [2], a recurrent neural network [5,6], or weakly-supervised learning [21,22]. These architectures have provided best results in previous challenges on preexisting data and included annotations.…”

Section: Methodsmentioning

confidence: 99%

“…Among these, the recurrent neural network (RNN)-based SED method [3,4] adopts a large temporal context and performs relatively better than basic CNN structure. Hybrid structures containing both CNN and RNN layers, known as convolutional recurrent neural networks (CRNNs) [5,6] are being developed for SED that integrate both spatial and C temporal properties of the audio signal. A CRNN for joint sound event localization and detection (SELD) of multiple overlapping sound events in three dimensional (3D) space has been proposed by Adavanne et al [7].…”

Section: A Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

Visual Object Detector for Cow Sound Event Detection

2020

View full text Add to dashboard Cite

Sound event detection (SED) is a reasonable choice in a number of application domains including cattle sheds, dense forests, or any dark environments where visual objects are usually concealed or invisible. This study presents an autonomous monitoring system based on sound characteristics developed for welfare management in large cattle farms. Two types of artificial audio datasets are prepared: the cow sound event dataset and the UrbanSound8K dataset, which are then used with various sound object detectors for real world implementation. Using a data-driven approach, a conventional convolutional neural network structure with certain improvements is first applied, and from there proceed to a two-stage visual object detection method for audio by treating acoustic signals as an RGB images. The object detection method achieves a higher quantitative evaluation score and more precise qualitative results than previous related studies. We conclude that visual object detection methods are more effective than currently-available CNN architectures for rare sound object detection. Indeed, an artificial data preparation strategy can provide a better method for addressing the problem of data scarcity and the annotation difficulties involved in rare sound event detection.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: A Literature Reviewmentioning

confidence: 99%

Visual Object Detector for Cow Sound Event Detection

2020

View full text Add to dashboard Cite

show abstract

“…Ding and He [95] proposed an adaptive multi-scale detection method that combined the idea of an hourglass network with Bidirectional GRU (BGR). It is a CRNN but with much higher sophistication.…”

Section: Figure 11 Flowchart Of a Crnnmentioning

confidence: 99%

“…The resulting values were then added up and sent to the output layer. In their study, Ding and He [95] proposed using a 4 layers hourglass network with 3 layers bidirectional GRU at each scale. Based on such architecture, Ding and He [95] achieved a single second F1-score of 48.7% with an ER of 0.7821 on TUT-SED 2016 evaluation dataset and a single second F1-score of 43.6% with an ER of 0.7723 on TUT-SED 2017 evaluation dataset.…”

Section: Figure 11 Flowchart Of a Crnnmentioning

confidence: 99%

A Comprehensive Review of Polyphonic Sound Event Detection

Chan

Chin

2020

IEEE Access

View full text Add to dashboard Cite

One of the most amazing functions of the human auditory system is the ability to detect all kinds of sound events in the environment. With the technologies and hardware advances, polyphonic Sound Event Detection (SED) can be developed to mimic the ability of the human auditory system. However, the development of a SED system is no trivial task, and several different factors often hinder accuracy. Although there are several overview papers available, most of them only provide a theoretical overview of algorithms used with little discussion. Thus, to the best of the authors' knowledge, there is no comprehensive review that covers this particular domain. Therefore, this paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations. Additional information on possible trends is also discussed that can be useful for future development works.

show abstract

“…The RNNs have been successfully applied to various tasks, such as data-driven modelling [39], image caption [40], sentiment analysis [41] and speech recognition [42]. For example, Ergen and Kozat [43] studied online training of the LSTM architecture in a distributed network of nodes for regression and introduced online distributed training algorithms for variable-length data sequences. Zhao et al [44] proposed a new approach, the CAM-RNN to extract the most correlated visual feature and a text feature for the task of video captioning, which was composed of three parts, i.e., visual attention module, text attention module and balancing gate.…”

Section: Model Buildingmentioning

confidence: 99%

Fitting Analysis of Inland Ship Fuel Consumption Considering Navigation Status and Environmental Factors

Zhi

Liu

et al. 2020

IEEE Access

View full text Add to dashboard Cite

The strategy of ecological priority and green development in China has made the fuel consumption of inland ships receive unprecedented attentions. Reliable fuel consumption prediction is the vital basis of navigation planning, energy supervision, and efficiency optimization. In this paper, a cargo ship sailing on the Yangtze River trunk line was taken as the research object. A comprehensive fitting analysis of inland ship fuel consumption was conducted, and a prediction method was proposed. First, the multi-source data including ship navigation status and environment information were collected by multi-source sensors. Second, to conduct a detailed analysis of the collected data, the authors proposed data pre-processing and trajectory segmentation methods and analyzed the correlation between multi-source variables and fuel consumption. Third, a Back Propagation Neural Network with double hidden layers (DBPNN) was tailored to build a fuel consumption prediction model. Fourth, the developed model was validated using real ship measurement data. Different input variables were selected for fuel consumption prediction, and the results showed that after adding the variables of environmental feature including water level, water speed, wind speed, wind angle, and route segment, the prediction error RMSE (root mean square error) and MAE (mean absolute error) were reduced by 35.31% and 30.30%, respectively, while the R 2 (R-squared) increased to 0.9843. What's more, compared with other ANNs (artificial neural networks) such as Elman, RBF (radial basis function), three support vector regression (SVR) models, random forest regression (RFR) model, GRNN (generalized regression neural network), RNN (recurrent neural network), GRU (gated recurrent unit) and LSTM (long short-term memory) the proposed DBPNN model showed better performance in fuel consumption prediction.

show abstract

Adaptive Multi-Scale Detection of Acoustic Events

Cited by 14 publications

References 57 publications

Visual Object Detector for Cow Sound Event Detection

Visual Object Detector for Cow Sound Event Detection

A Comprehensive Review of Polyphonic Sound Event Detection

Fitting Analysis of Inland Ship Fuel Consumption Considering Navigation Status and Environmental Factors

Contact Info

Product

Resources

About