Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1156
|View full text |Cite
|
Sign up to set email alerts
|

FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction

Abstract: In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the mul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 22 publications
0
9
0
Order By: Relevance
“…Another technique that is widely used to speed up the inference of vocoders is subband modeling, which divides the waveform into multiple subbands for fast inference. Typical models include DurIAN [411], multi-band MelGAN [400], subband WaveNet [244], and multi-band LPCNet [342]. Bunched LPCNet [364] reduces the computation complexity of LPCNet with sample bunching and bit bunching, achieving more than 2x speedup.…”
Section: Adaptivementioning
confidence: 99%
“…Another technique that is widely used to speed up the inference of vocoders is subband modeling, which divides the waveform into multiple subbands for fast inference. Typical models include DurIAN [411], multi-band MelGAN [400], subband WaveNet [244], and multi-band LPCNet [342]. Bunched LPCNet [364] reduces the computation complexity of LPCNet with sample bunching and bit bunching, achieving more than 2x speedup.…”
Section: Adaptivementioning
confidence: 99%
“…Consequently, Full-band LPCNet is the only neural vocoder that can realize real-time and high-fidelity speech synthesis with a sampling frequency of 48 kHz using a CPU. As future work, Full-band LPCNet can be made much faster by applying acceleration methods, such as the subband [20], [33], [50], [52], sample bunching [51] and tensor decomposition [48] methods. Additionally, Full-band LPCNet can be extended to multi-speaker neural vocoder to synthesize the speech waveforms of many and unspecified speakers that were not included in training [75].…”
Section: ) Subjective Evaluationmentioning
confidence: 99%
“…In singing voice synthesis, we found it necessary to adjust the batch length of the input features appropriately. Although we performed only a simple extension for fullband synthesis in this study, acceleration methods such as subband [20], [33], [50], [52], sample bunching [51], and tensor decomposition [48] methods can be directly applied to Full-band LPCNet to further improve the synthesis speed.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to real-time autoregressive neural vocoders such as WaveRNN [4], LPCNet [5], and FeatherWave [6], non-autoregressive models, which simultaneously synthesize all speech waveform samples, can be easily implemented as real-time neural vocoders, and many models have been investigated. Non-autoregressive neural vocoders are broadly categorized into two types.…”
Section: Introductionmentioning
confidence: 99%