The RCELP speech‐coding algorithm

Kleijn, W. Bastiaan; Kroon, P.; Nahumi, D.

doi:10.1002/ett.4460050508

Cited by 32 publications

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speech compression aims to reduce the bitrate required to represent a speech signal. In classical coding methods [1][2][3][4][5][6][7], all processing was based on knowledge of human experts only. Recent advances in speech coding follow progress in speech synthesis [8][9][10] by replacing the decoder [11][12][13] as well as the quantizer [14] with a machine-learning (ML) based model that significantly improves the coding quality.…”

Section: Introductionmentioning

confidence: 99%

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Siahkoohi¹,

Chinen²,

Denton³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Speech coding facilitates the transmission of speech over lowbandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing highfidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. As such, we use a pretrained Transformer in tandem with a convolutional encoder, which is trained end-to-end with a quantizer and a generative adversarial net decoder. Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of 600 bps that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. Subjective human evaluations suggest that the quality of the resulting codec is comparable or better than that of conventional codecs operating at three to four times the rate.

show abstract