Voice conversion technologies transform the voice uttered by a person (source speaker) in order that it is perceived as if another person had said it (target speaker). Traditionally the transformation is applied to the natural speech or to the synthetic speech as a post-processing block for a synthesizer. The aim of this thesis is to study the technologies in the state of the art and to incorporate them in a system of synthesis of voice. To achieve this goal it is necessary to know thoroughly the method of synthesis to use, study and develop the technology that better adapts to its characteristics. Two types of synthesizer have been studied:1. Formant synthesizer that concatenates parametrized units. In this case the parameters that are used for synthesis are the first five formants and four glottal source parameters of the model LF(Liljencrants and Fant).2. The second one is a LP(Linear Prediction) synthesizer that concatenates codified units. The source model is a polynomial of sixth order to shape the integral of the LP residue and a later filter to enhace high frequencies.In both cases tools have been developed or modified to analize 455 units corresponding to four speakers: two men and two women.The voice conversion techniques develop in each synthesizer are:1. A linear transformation is applied to convert the formants and we copy the LF model parameters of the target speaker.2. In case of the LP synthesizer the technology used for transformation is codebook mapping.The relevancy of the parameters used in the formant synthesizer has been studied as far as speaker identity is concerned. The conclusion of the above mentioned study indicates that the information about the speaker identity is distributed among all the analyzed parameters being the most relevant the fundamental frequency, F0, and the formant frequencies.The source model of the LP synthesizer has been modified to favor speaker transformation. There has been verified that the proposed source model supports a equivalent quality to the synthesizer that uses codified units CELP (Coded Excited Linear Prediction).Objective and subjective tests have been carried out to evaluate the ability to transform the speaker and the quality of the synthesized voice. There is demonstrated that the used technologies are efficient at the moment of changing the identity of the speaker but it is also observed a degradation of the quality of the synthetic voice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.