We have previously proposed a method that allows for non-parallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN. The main features of our method, called StarGAN-VC, are as follows: First, it requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training. Second, it can simultaneously learn mappings across multiple domains using a single generator network so that it can fully exploit available training data collected from multiple domains to capture latent features that are common to all the domains. Third, it is able to generate converted speech signals quickly enough to allow real-time implementations and requires only several minutes of training examples to generate reasonably realistic-sounding speech. In this paper, we describe three formulations of StarGAN, including a newly introduced novel StarGAN variant called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a non-parallel VC task. We also compare them with several baseline methods.
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge. To reduce this gap, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (twostep adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN). We evaluated our method on a non-parallel VC task and analyzed the effect of each technique in detail. An objective evaluation showed that these techniques help bring the converted feature sequence closer to the target in terms of both global and local structures, which we assess by using Mel-cepstral distortion and modulation spectra distance, respectively. A subjective evaluation showed that CycleGAN-VC2 outperforms CycleGAN-VC in terms of naturalness and similarity for every speaker pair, including intra-gender and inter-gender pairs. 1
Eye and head morphology vary considerably among insects and even between closely related species of Drosophila. Species of the D. melanogaster subgroup, and other Drosophila species, exhibit a negative correlation between eye size and face width (FW); for example, D. mauritiana generally has bigger eyes composed of larger ommatidia and conversely a narrower face than its sibling species. To better understand the evolution of eye and head morphology, we investigated the genetic and developmental basis of differences in eye size and FW between male D. mauritiana and D. simulans. QTL mapping of eye size and FW showed that the major loci responsible for the interspecific variation in these traits are localized to different genomic regions. Introgression of the largest effect QTL underlying the difference in eye size resulted in flies with larger eyes but no significant difference in FW. Moreover, introgression of a QTL region on the third chromosome that contributes to the FW difference between these species affected FW, but not eye size. We also observed that this difference in FW is detectable earlier in the development of the eye-antennal disc than the difference in the size of the retinal field. Our results suggest that different loci that act at different developmental stages underlie changes in eye size and FW. Therefore, while there is a negative correlation between these traits in Drosophila, we show genetically that they also have the potential to evolve independently and this may help to explain the evolution of these traits in other insects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.