Dynamic Frequency Warping (DFW) is widely used to align spectra of different speakers. It has long been argued that frequency warping captures inter-speaker differences but DFW practice always involves a tricky preprocessing part to remove spectral tilt. The DFW residual is successfully used in Voice Morphing to improve the quality and the similarity of synthesized speech but the estimation of the DFW residual remains largely heuristic and sub-optimal. This paper presents a dynamic programming algorithm that simultaneously estimates the Optimal Frequency Warping and Weighting transform (ODFWW) and therefore needs no preprocessing step and fine-tuning while source/target-speaker data are matched using the Matching-Minimization algorithm [1]. The transform is used to morph the output of a state-of-the-art Vocaine-based [2] TTS synthesizer in order to generate different voices in runtime with only +8% computational overhead. Some morphed TTS voices exhibit significantly higher quality than the original one as morphing seems to "correct" the voice characteristics of the TTS voice.
Kernelized eigenvoice methods, which apply a nonlinear transform in speaker space, have previously been proposed for rapid adaptation. This paper examines, and addresses, a number of limitations and issues with the current schemes. First, the requirements for valid probability functions using kernel representations are discussed. Second, rapid speaker adaptation using these forms of representations is analyzed and the general update formulae for kernelized eigenvoice adaptation derived. The existing kernelized eigenvoice methods are then described within this formulation. This allows an EM-based, rather than gradient-descent-based, parameter estimation. To enable these approaches to be applied to large-vocabulary speech recognition tasks, eigenbases using transformations of an underlying canonical model are described and related to existing adaptation methods. Preliminary experiments on a large-vocabulary conversational telephone speech task are finally detailed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.