2019
DOI: 10.1109/jstsp.2019.2908700
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning for Audio Signal Processing

Abstract: Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered sideby-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
247
1
17

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 604 publications
(322 citation statements)
references
References 118 publications
0
247
1
17
Order By: Relevance
“…It has made a remarkable impact in computer vision performance previously unattainable on many tasks such as image classification and object detection. Deep learning is applied in research concerning graphical modeling, pattern recognition, signal processing [1], computer vision [2], speech recognition [3], language recognition [4,5], audio recognition [6], and face recognition (FR) [7]. In biometrics, deep learning can be used to represent the unique biometric data and make improvements in the performance of many authentication and recognition systems.…”
Section: Introductionmentioning
confidence: 99%
“…It has made a remarkable impact in computer vision performance previously unattainable on many tasks such as image classification and object detection. Deep learning is applied in research concerning graphical modeling, pattern recognition, signal processing [1], computer vision [2], speech recognition [3], language recognition [4,5], audio recognition [6], and face recognition (FR) [7]. In biometrics, deep learning can be used to represent the unique biometric data and make improvements in the performance of many authentication and recognition systems.…”
Section: Introductionmentioning
confidence: 99%
“…Hidden Markov Models) pod koniec lat 80. XX wieku [2,42] i zastosowanie głębokich sieci neuronowych DNN, począwszy od około 2005 r. [41]. Początkowo sieci te stosowano do klasyfikacji pojedynczych ramek sygnału w terminach podfonemów, pozostawiając modelom HMM zadanie rozpoznawania sekwencji obserwacji, co przyjmowało postać hybrydowego rozwiązania DNN-HMM.…”
Section: Rozpoznawanie Mowyunclassified
“…W drugiej metodzie prowadzona jest najpierw kolejna dekompozycja na czynniki, tym razem przestrzeni i-wektorów, modelująca zakłócenia i klasyfikacja stosująca zaawansowane stochastyczne miary odległości. − Głębokie sieci neuronowe znalazły również zastosowanie do rozpoznawania mówców [8,41]. Prace badawcze dotyczą wykorzystania tych sieci do modelowania mówcy -znajdywania w procesie uczenia nieliniowego przekształcenia cech zastępującego model mieszanin Gaussa -a także na etapie dopasowania obserwacji z modelem [28].…”
Section: Rozpoznawanie Mówcyunclassified
“…Results are encouraging due to during the learning phase, an accuracy greater than 77% is achieved. In [25], the authors provide a review of the state-of-the-art deep learning techniques for audio signal processing. Analyzed works range from variants of the long short-term memory architecture, audio-specific neural network models, and also it includes convolution neural networks.…”
Section: Introductionmentioning
confidence: 99%