Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1510
|View full text |Cite
|
Sign up to set email alerts
|

Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
184
0
2

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
1

Relationship

4
5

Authors

Journals

citations
Cited by 217 publications
(187 citation statements)
references
References 0 publications
1
184
0
2
Order By: Relevance
“…In addition to the diverse training sets described in Sec. 3.1 and 3.2, multi-condition training (MTR) [27,28] and random data downsampling to 8kHz [29] are also used to further increase data diversity. Noisy data is generated at signal-noise-ratio (SNR) from 0 to 30 dB, with an average SNR of 12 dB, and with T60 times ranging from 0 to 900 msec, averaging 500 msec.…”
Section: Methodsmentioning
confidence: 99%
“…In addition to the diverse training sets described in Sec. 3.1 and 3.2, multi-condition training (MTR) [27,28] and random data downsampling to 8kHz [29] are also used to further increase data diversity. Noisy data is generated at signal-noise-ratio (SNR) from 0 to 30 dB, with an average SNR of 12 dB, and with T60 times ranging from 0 to 900 msec, averaging 500 msec.…”
Section: Methodsmentioning
confidence: 99%
“…For a certain filterbank energy value e[m, c], let us define the following term η and the function f that is the ratio of e[m, c] to epeak in decibels (dB): Fig. 1a shows the probability density function of the timefrequency bins with respect to η defined in (3). In obtaining this distribution, we used a randomly chosen 1,000 utterances from the LibriSpeech training corpus [14].…”
Section: Distribution Of Energy In Time-frequency Bins Of Speech Signalsmentioning
confidence: 99%
“…These improvements have been obtained by the shift from Gaussian Mixture Model (GMM) to the Feed-Forward Deep Neural Networks (FF-DNNs), FF-DNNs to Recurrent Neural Network (RNN) such as the Long Short-Term Memory (LSTM) networks [2]. Thanks to these advances, voice assistant devices such as Google Home [3] , Amazon Alexa and Samsung Bixby [4] are widely used at home environments.…”
Section: Introductionmentioning
confidence: 99%
“…This improvement has come about from the shift from Gaussian Mixture Model (GMM) to the Feed-Forward Deep Neural Networks (FF-DNNs), FF-DNNs to Recurrent Neural Network (RNN) and in particular the Long Short-Term Memory (LSTM) networks [9]. Thanks to these advances, voice assistant devices such as Google Home [2,10] , Amazon Alexa or Samsung Bixby [11] are being used at many homes and on personal devices.…”
Section: Introductionmentioning
confidence: 99%