2021
DOI: 10.1109/access.2021.3118033
|View full text |Cite
|
Sign up to set email alerts
|

PeriodNet: A Non-Autoregressive Raw Waveform Generative Model With a Structure Separating Periodic and Aperiodic Components

Abstract: This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw waveform generative models have enabled the fast generation of high-quality waveforms. However, the variations of waveforms that these models can reconstruct are limited by training data. In addition, typical non-AR models reconstruct a speech waveform from a single Gaussian input despite the mixture of periodic and aper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 37 publications
0
13
0
Order By: Relevance
“…As we all know, the text is converted into lyrics and sung by the computer from musical scores using the SVS system based on HMM, but because the voice is easily distinguishable from the natural singing voice, we can also implement the SVS using DNN, and DNN-based systems provide more utilized and efficient results when compared to HMM-based systems. It also outperforms the HMMbased system in subjective listening [10]. The PeriodNet system, which converts/models periodic and aperiodic waveforms into speech waveforms, can also be implemented.…”
Section: Literature Reviewmentioning
confidence: 99%
“…As we all know, the text is converted into lyrics and sung by the computer from musical scores using the SVS system based on HMM, but because the voice is easily distinguishable from the natural singing voice, we can also implement the SVS using DNN, and DNN-based systems provide more utilized and efficient results when compared to HMM-based systems. It also outperforms the HMMbased system in subjective listening [10]. The PeriodNet system, which converts/models periodic and aperiodic waveforms into speech waveforms, can also be implemented.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In the case of neural vocoders, the synthesis quality deteriorates when the input f o is not included in the range of the training data. Several approaches have been proposed to solve this problem [33], [34], [35], [36], [37], [38], [39]. In contrast to AR models [33], [34], [35], non-AR models [36], [37], [38], [39] can realize real-time inference.…”
mentioning
confidence: 99%
“…Several approaches have been proposed to solve this problem [33], [34], [35], [36], [37], [38], [39]. In contrast to AR models [33], [34], [35], non-AR models [36], [37], [38], [39] can realize real-time inference. The neural source filter [36] introduces nonlinear filtering and dilated convolutional layers for parametrically generated source excitation signals corresponding to f o by source-filter modeling [40].…”
mentioning
confidence: 99%
See 2 more Smart Citations