The processes of language demise take hold when a language ceases to belong to the mainstream of life's activities. Digital communication technology increasingly pervades all aspects of modern life. Languages not digitally 'available' are ever more marginalised, whereas a digital presence often yields unexpected opportunities to integrate the language into the mainstream. The ABAIR initiative embraces three central aspects of speech technology development for Irish (Gaelic): the provision of technology-oriented linguistic-phonetic resources; the building and perfecting of core speech technologies; and the development of technology applications, which exploit both the technologies and the linguistic resources. The latter enable the public, learners, and those with disabilities to integrate Irish into their day-today usage. This paper outlines some of the specific linguistic and sociolinguistic challenges and the approaches adopted to address them. Although machine-learning approaches are helping to speed up the process of technology provision, the ABAIR experience highlights how phonetic-linguistic resources are also crucial to the development process. For the endangered language, linguistic resources are central to many applications that impact on language usage. The sociolinguistic context and the needs of potential end users should be central considerations in setting research priorities and deciding on methods.
This work aims to improve text-to-speech synthesis for Wikipedia by advancing and implementing models of prosodic prominence. We propose a new system architecture with explicit prominence modeling and test the first component of the architecture. We automatically extract a phonetic feature related to prominence from the speech signal in the ARCTIC corpus. We then modify the label files and train an experimental TTS system based on the feature using Merlin, a statistical-parametric DNN-based engine. Test sentences with contrastive prominence on the word-level are synthesised and separate listening tests a) evaluating the level of prominence control in generated speech, and b) naturalness, are conducted. Our results show that the prominence feature-enhanced system successfully places prominence on the appropriate words and increases perceived naturalness relative to the baseline.
A popular idea in Computer Assisted Language Learning (CALL) is to use
multimodal annotated texts, with annotations typically including embedded
audio and translations, to support L2 learning through reading. An important
question is how to create the audio, which can be done either through human
recording or by a Text-To-Speech (TTS) synthesis engine. We may reasonably
expect TTS to be quicker and easier, but humans to be of higher quality.
Here, we report a study using the open-source LARA platform and ten
languages. Samples of LARA audio totaling about three and a half minutes
were provided for each language in both human and TTS form; subjects used a
web form to compare different versions of the same item and rate the voices
as a whole. Although human voice was more often preferred, TTS achieved
higher ratings in some languages and was close in others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.