2017
DOI: 10.1007/978-3-319-69900-4_40
|View full text |Cite
|
Sign up to set email alerts
|

Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(9 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…The second is an end-to-end method for dealing with raw audio data by using a shallow and wide raw feature extractor [46,47] rather than a Fourier transform. The third one uses improved learning techniques, such as data argumentation [47][48][49], pre-processing (or post-processing) [50,51], and other learning methods, such self (or weakly) supervised learning [10,[37][38][39][40][41]47,51].…”
Section: Audio Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…The second is an end-to-end method for dealing with raw audio data by using a shallow and wide raw feature extractor [46,47] rather than a Fourier transform. The third one uses improved learning techniques, such as data argumentation [47][48][49], pre-processing (or post-processing) [50,51], and other learning methods, such self (or weakly) supervised learning [10,[37][38][39][40][41]47,51].…”
Section: Audio Feature Extractionmentioning
confidence: 99%
“…In the initialization section, the hyper-parameters of the FFT were experimentally obtained based on values that are commonly used in ESC-50, and the hyper-parameters of the MTST were obtained via a greedy search. The ranges of the greedy searches were as follows: The steps were [50,100,150,200,250, 300] and x_size was [224,401,501,601,701,801]. M was a method of conversion from raw data into a mel-spectrogram with the conversion of power to decibels (dB).…”
Section: Audio Feature Extraction Via Multi-time-scale Transformmentioning
confidence: 99%
“…Accuracy(%) ESC-10 ESC-50 Human [34] 95.70 81.30 EnvNet-v2 [14] 88.80 81.6 ± 0.2 GSTC TEO-GSTC [54] -81.95 GTSC ConvRBM-BANK [12] -83.00 Kumar-CNN [55] -83.50 CNN+Augment+Mixup [56] 91.70 83.90 Multi-stream+Attention [57] 94.20 84.00 FBEs PEFBEs [58] -84.15 EnvNet-v2 [14] Few state-of-the-art models report their size and computation requirements in the literature. According to our knowledge on current literature, EnvNet-v2 and AclNet are the only two of the top-ten models for either of the two datasets to report parameter count, size, and FLOPs.…”
Section: Networkmentioning
confidence: 99%
“…Since DCT is a linear transform, some information in highly non-linear speech signals will be discarded undesirably. For this reason, MFBE is becoming increasingly popular due to rapid growth of endto-end techniques using deep learning, because it can provide more information than MFCC for neural networks to delve into [15], [16], [42].…”
Section: Acoustic Featurementioning
confidence: 99%