2022
DOI: 10.1016/j.csl.2021.101308
|View full text |Cite
|
Sign up to set email alerts
|

Generative adversarial networks for speech processing: A review

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(23 citation statements)
references
References 35 publications
0
23
0
Order By: Relevance
“…Most of the existing models of lexical learning focus primarily on either ASR/speech-to-text (perception) or text-tospeech/speech synthesis (production; see [3] for an overview). Variational Autoencoders (VAEs) involve both an encoder and decoder, which allows unsupervised acoustic word embedding as well as generation of speech, but these proposals only use VAEs for either unsupervised ASR [4,5,6,7] or for speech synthesis/transformation (e.g.…”
Section: Prior Workmentioning
confidence: 99%
“…Most of the existing models of lexical learning focus primarily on either ASR/speech-to-text (perception) or text-tospeech/speech synthesis (production; see [3] for an overview). Variational Autoencoders (VAEs) involve both an encoder and decoder, which allows unsupervised acoustic word embedding as well as generation of speech, but these proposals only use VAEs for either unsupervised ASR [4,5,6,7] or for speech synthesis/transformation (e.g.…”
Section: Prior Workmentioning
confidence: 99%
“…The majority of popular ASR systems use Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs), and Deep Neural Networks (DNNs) [ 12 , 13 , 14 , 15 ]. DNNs play an essential part in the building of ASR systems [ 16 , 17 ], mostly because of the evolution of unique neural network models, as well as training and classification techniques [ 18 , 19 ]. They have also been applied to problems such as feature extraction [ 20 , 21 ], audio signal classification [ 22 , 23 ], text recognition and TTS [ 24 , 25 ], disordered speech processing [ 26 ], and speech recognition based on small and large vocabulary [ 27 ].…”
Section: Introductionmentioning
confidence: 99%
“…Generally, pattern recognition systems consist of two main components: feature analysis and pattern classification. Most state-of-the-art speech recognition systems are based on hidden Markov models (HMMs) or artificial neural networks (ANNs), or HMM and ANN hybrids [ 12 , 13 , 14 , 15 ]. Neural networks play an important role both in speech [ 15 , 16 , 17 ] and speaker recognition [ 18 , 19 , 20 , 21 ], mainly due to the development of new neural network topologies as well as training and classification algorithms [ 14 , 22 , 23 ].…”
Section: Introductionmentioning
confidence: 99%
“…Most state-of-the-art speech recognition systems are based on hidden Markov models (HMMs) or artificial neural networks (ANNs), or HMM and ANN hybrids [ 12 , 13 , 14 , 15 ]. Neural networks play an important role both in speech [ 15 , 16 , 17 ] and speaker recognition [ 18 , 19 , 20 , 21 ], mainly due to the development of new neural network topologies as well as training and classification algorithms [ 14 , 22 , 23 ]. They have also been used for tasks such as classification [ 12 , 24 , 25 ] or feature extraction [ 26 , 27 ], isolated word recognition [ 28 ], small and large vocabulary and continuous speech recognition [ 29 , 30 ], as well as in disordered speech processing [ 7 , 8 , 12 , 13 , 31 , 32 , 33 , 34 , 35 , 36 ].…”
Section: Introductionmentioning
confidence: 99%