DOI: 10.32657/10356/59272
|View full text |Cite
|
Sign up to set email alerts
|

Sound event recognition in unstructured environments using spectrogram image processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
44
0

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(47 citation statements)
references
References 224 publications
(371 reference statements)
3
44
0
Order By: Relevance
“…We set up the standard experiment of the robust audio event recognition task similar to current state-of-the-art works [18,1,6] so that the results are comparable. Audio event database.…”
Section: Databasesmentioning
confidence: 99%
See 1 more Smart Citation
“…We set up the standard experiment of the robust audio event recognition task similar to current state-of-the-art works [18,1,6] so that the results are comparable. Audio event database.…”
Section: Databasesmentioning
confidence: 99%
“…Noise database. As in [18,1,6], we chose four different environmental noises from NOISEX-92 database [20], including "Destroyer Control Room", "Speech Bable", "Factory Floor 1", and "Jet Cockpit 1". Beside clean signals, we also created noise-corrupted signals by randomly choosing one of four noise signals to add to the clean signals at random starting points.…”
Section: Databasesmentioning
confidence: 99%
“…where X t is a column vector containing the absolute amplitude of the DFT frequency bins and T is the number of DFTs extracted from the signal. The absolute amplitude is favored over the logamplitude as it has shown to yield better results for spectrogram image classification in [32] and in our own experiments. The spectrograms are normalized: each frequency bin is divided by the maximum amplitude value contained in a time frame.…”
Section: Feature Extractionmentioning
confidence: 78%
“…For example, histograms of oriented gradients (HOG) were used to perform word recognition [31]. In [32], spectrograms amplitudes are quantized and mapped into a color coded image. Color distributions are then characterized and analyzed.…”
Section: Feature Learning For Speech Analysismentioning
confidence: 99%
See 1 more Smart Citation