Yazhe Li scite author profile

In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

show abstract

A flow-difference feedback iteration method and its application to high-speed aerostatic journal bearings

Zhou

Zhang

2015

Tribology International

View full text Add to dashboard Cite

Identification of landslide spatial distribution and susceptibility assessment in relation to topography in the Xi’an Region, Shaanxi Province, China

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yazhe Li

Representation Learning with Contrastive Predictive Coding

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder

A flow-difference feedback iteration method and its application to high-speed aerostatic journal bearings

Identification of landslide spatial distribution and susceptibility assessment in relation to topography in the Xi’an Region, Shaanxi Province, China

Contact Info

Product

Resources

About