Reverberant speech recognition

Haeb‐Umbach, Reinhold; Krueger, Alexander

doi:10.1016/b978-0-12-802398-3.00009-x

Cited by 4 publications

(1 citation statement)

References 84 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A wide range of noise-robust techniques developed over past 30 years can be analyzed and categorized using five different criteria: (1) feature-domain versus model-domain processing, (2) the use of prior knowledge about the acoustic environment distortion, (3) the use of explicit environment-distortion models, (4) deterministic versus uncertainty processing, and (5) the use of acoustic models trained jointly with the same feature enhancement or model adaptation process used in the testing stage. See a comprehensive review in [109,110] and additional review literature or original work in [111–114].…”

Section: Achievements Of Deep Learning In Speech Recognitionmentioning

confidence: 99%

Deep learning: from speech recognition to language and multimodal processing

Deng

2016

SIP

Self Cite

View full text Add to dashboard Cite

While artificial neural networks have been in existence for over half a century, it was not until year 2010 that they had made a significant impact on speech recognition with a deep form of such networks. This invited paper, based on my keynote talk given at Interspeech conference in Singapore in September 2014, will I . I N T R O D U C T I O NThe main theme of this paper is to reflect on the recent history of how deep learning has profoundly revolutionized the field of automatic speech recognition (ASR) and to elaborate on what kind of lessons we can learn to not only further advance ASR technology but also to impact the related, arguably more important, applications in language and multimodal processing. Language processing concerns "downstream" analysis and distillation of information from the ASR systems' outputs. Semantic analysis of language and multimodal processing involving speech, text, and image, both experiencing rapid advances based on deep learning over the past few years, holds the potential to solve some difficult and remaining ASR problems and present new challenges for the deep learning technology.A message to be conveyed in this paper is the importance of broadening deep learning from deep neural networks (DNNs) to include deep generative models as well. In fact, a brief historical review conducted in Section II will touch Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA Corresponding author: Li Deng Email: deng@microsoft.com on how the development of deep (and dynamic) generative models of speech played a role in the inroads of DNNs into modern ASR. Since 2011, the DNN has taken over the dominating (shallow) generative model of speech, the Gaussian Mixture Model (GMM), as the output distribution in the Hidden Markov Model (HMM). This purely discriminative DNN has been well-known to the ASR community, which can be considered as a shallow network unfolding in space. When the unfolding occurs in time, we have the recurrent neural network (RNN). On the other hand, deep generative models have distinct advantages over discriminative DNNs, including the strengths of model interpretability, of embedding domain knowledge and causal relationships, and of modeling uncertainty. Deep generative and discriminative models represent two apparently opposing approaches yet with highly complementary strengths and weaknesses. The further success of deep learning is likely to lie in how to seamlessly integrate the two approaches in a practically effective and theoretically appealing fashion, and to achieve the best of both worlds.The remainder of this paper is organized as follows. In Section II, some brief history is provided on how deep learning made inroad into speech recognition, and a number of enabling factors are discussed. Outstanding achievements of deep learning both in academic world and in industry to 1 https://www.cambridge.org/core/terms. https://doi

show abstract

Section: Achievements Of Deep Learning In Speech Recognitionmentioning

confidence: 99%

Deep learning: from speech recognition to language and multimodal processing

Deng

2016

SIP

Self Cite

View full text Add to dashboard Cite

show abstract

Mobile Application Identification based on Hidden Markov Model

Yang

Xiao

et al. 2018

ITM Web Conf.

View full text Add to dashboard Cite

Abstract.With the increasing number of mobile applications, there has more challenging network management tasks to resolve. Users also face security issues of the mobile Internet application when enjoying the mobile network resources. Identifying applications that correspond to network traffic can help network operators effectively perform network management. The existing mobile application recognition technology presents new challenges in extensibility and applications with encryption protocols. For the existing mobile application recognition technology, there are two problems, they can not recognize the application which using the encryption protocol and their scalability is poor. In this paper, a mobile application identification method based on Hidden Markov Model(HMM) is proposed to extract the defined statistical characteristics from different network flows generated when each application starting. According to the time information of different network flows to get the corresponding time series, and then for each application to be identified separately to establish the corresponding HMM model. Then, we use 10 common applications to test the method proposed in this paper. The test results show that the mobile application recognition method proposed in this paper has a high accuracy and good generalization ability.

show abstract

Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation

Dionelis

Brookes

2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

We describe a monaural speech enhancement algorithm based on modulation-domain Kalman filtering to blindly track the time-frequency log-magnitude spectra of speech and reverberation. We propose an adaptive algorithm that performs blind joint denoising and dereverberation, while accounting for the inter-frame speech dynamics, by estimating the posterior distribution of the speech log-magnitude spectrum given the log-magnitude spectrum of the noisy reverberant speech. The Kalman filter update step models the non-linear relations between the speech, noise and reverberation log-spectra. The Kalman filtering algorithm uses a signal model that takes into account the reverberation parameters of the reverberation time, T60, and the direct-to-reverberant energy ratio (DRR) and also estimates and tracks the T60 and the DRR in every frequency bin to improve the estimation of the speech log-spectrum. The proposed algorithm is evaluated in terms of speech quality, speech intelligibility and dereverberation performance for a range of reverberation parameters and reverberant speech to noise ratios, in different noises, and is also compared to competing denoising and dereverberation techniques. Experimental results using noisy reverberant speech demonstrate the effectiveness of the enhancement algorithm.

show abstract

Reverberant speech recognition

Cited by 4 publications

References 84 publications

Deep learning: from speech recognition to language and multimodal processing

Deep learning: from speech recognition to language and multimodal processing

Mobile Application Identification based on Hidden Markov Model

Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation

Contact Info

Product

Resources

About