Previous work on cross-lingual transfer learning in text-tospeech has shown the effectiveness of fine-tuning phonemic representations on small amounts of target language data. In other contexts, phonological features (PFs) have been suggested as a more suitable input representation than phonemes for sharing acoustic information between languages, for example in multilingual model training or for code-switching synthesis where an utterance may contain words from multiple languages. Starting from a model trained on 14 hours of English, we find that cross-lingual fine-tuning with 15 minutes of German data can produce speech with subjective naturalness ratings comparable to a model trained from scratch on 4 hours of German, using either phonemes or PFs. We also find a modest but statistically significant improvement in naturalness ratings using PFs over phonemes when training from scratch on 4 hours of German.
In this work we present an end-to-end pipeline for building a speech corpus and text-to-speech synthesis system for a new language without reference to any expert-defined linguistic resources. We segment and align over 85 hours of Scottish Gaelic recordings found online and select 2-and 8-hour subsets with comprehensive coverage of speech sounds based on self-supervised discrete acoustic unit sequences. We then compare FastPitch models trained on these relatively small data sets using character, acoustic unit and phone inputs. According to native speaker listening test judgements, characters serve well for Gaelic given its regular orthography, even in these limited data scenarios. We release our corpus building recipe so that others may easily apply our work to new languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.