Nowadays more and more applications can benefit from edge-based text-to-speech (TTS). However, most existing TTS models are too computationally expensive and are not flexible enough to be deployed on the diverse variety of edge devices with their equally diverse computational capacities. To address this, we propose FBWave, a family of efficient and scalable neural vocoders that can achieve optimal performance-efficiency trade-offs for different edge devices. FBWave is a hybrid flow-based generative model that combines the advantages of autoregressive and nonautoregressive models. It produces high quality audio and supports streaming during inference while remaining highly computationally efficient. Our experiments show that FB-Wave can achieve similar audio quality to WaveRNN while reducing MACs by 40x. More efficient variants of FBWave can achieve up to 109x fewer MACs while still delivering acceptable audio quality. Audio demos are available at https: //bichenwu09.github.io/vocoder_demos.
Portable devices for the consumer market are becoming available in large quantities. Because of their design and use, human speech often is the input modality of choice, for example for car navigation systems or portable speech-to-speech translation devices. In this paper we describe our work in porting our existing desktop PC based speech recognition system to an off-the-shelf PDA running WindowsCE3.0. We do this in a way that our already well performing language and acoustic models can be taken over without the need of retraining them for the PDA. In order to achieve an acceptable run-time behavior we apply several optimization techniques to the preprocessing and decoding process. Among other things we introduce the newly developed early feature vector reduction. In that way the execution time of our recognition system can be reduced from initially 28x realtime to 2.6x real-time with a tolerable increase in word error rate. The size of the acoustic models is reduced to 25% of its original size.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.