“…On the other hand, in the handling of prosodic parameters, such as fundamental frequency (F0), several methods have been commonly used including a simple mean/variance linear transformation, a contour-based transformation [13], GMM-based mapping [14], and neural network [15]. For waveform generation, approaches include the source-filter vocoder system [16], the latest direct waveform modification technique [2], and the use of state-ofthe-art WaveNet modeling [17,18,19].…”