Brooke Stephenson scite author profile

Brooke Stephenson

5Publications

26Citation Statements Received

52Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

Stephenson¹,

Hueber²,

Girin³

et al. 2021

View full text Add to dashboard Cite

The prosody of a spoken word is determined by its surrounding context. In incremental text-to-speech synthesis, where the synthesizer produces an output before it has access to the complete input, the full context is often unknown which can result in a loss of naturalness in the synthesized speech. In this paper, we investigate whether the use of predicted future text can attenuate this loss. We compare several test conditions of next future word: (a) unknown (zero-word), (b) language model predicted, (c) randomly predicted and (d) ground-truth. We measure the prosodic features (pitch, energy and duration) and find that predicted text provides significant improvements over a zero-word lookahead, but only slight gains over random-word lookahead. We confirm these results with a perceptive test.

show abstract

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

Stephenson¹,

Besacier²,

Girin³

et al. 2020

View full text Add to dashboard Cite

In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this incremental policy on the evolution of the encoder representations of token n for different values of k (the lookahead parameter). The results show that, on average, tokens travel 88% of the way to their full context representation with a one-word lookahead and 94% after 2 words. We then investigate which text features are the most influential on the evolution towards the final representation using a random forest analysis. The results show that the most salient factors are related to token length. We finally evaluate the effects of lookahead k at the decoder level, using a MUSHRA listening test. This test shows results that contrast with the above high figures: speech synthesis quality obtained with 2 word-lookahead is significantly lower than the one obtained with the full sentence.

show abstract

BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

Stephenson¹,

Besacier²,

Girin³

et al. 2022

View full text Add to dashboard Cite

Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

Stephenson¹,

Hueber²,

Girin³

et al. 2021

Preprint

View full text Add to dashboard Cite

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

Stephenson¹,

Besacier²,

Girin³

et al. 2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.