2018
DOI: 10.3390/mti2010009
|View full text |Cite
|
Sign up to set email alerts
|

Interactive Hesitation Synthesis: Modelling and Evaluation

Abstract: Abstract:Conversational spoken dialogue systems that interact with the user rather than merely reading the text can be equipped with hesitations to manage dialogue flow and user attention. Based on a series of empirical studies, we elaborated a hesitation synthesis strategy for dialogue systems, which inserts hesitations of a scalable extent wherever needed in the ongoing utterance. Previously, evaluations of hesitation systems have shown that synthesis quality is affected negatively by hesitations, but that t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
19
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 20 publications
(20 citation statements)
references
References 32 publications
1
19
0
Order By: Relevance
“…One practical application of the results obtained here is the extension of the hesitation insertion model for speech synthesis, which has been prototypically tested in Betz et al (2018), which did not yet take into account the structural interplay of silences and fillers. Furthermore, the hesitation model by Betz et al (2018) is centered on lengthening, which provides an elegant entry point for a synthetic hesitation interval, and reflects human speech production by making use of the pre-planned, but not-yet-uttered words in the articulatory buffer (Levelt, 1989). This approach receives support by the confirmed notion of longer silences after um-type fillers: the presence of a nasal sound makes this type of filler a better candidate to smoothly initiate a hesitation interval by lengthening compared to the uh-type fillers (for hesitation lengthening distribution over phone types, cf.…”
Section: Application and Outlookmentioning
confidence: 99%
“…One practical application of the results obtained here is the extension of the hesitation insertion model for speech synthesis, which has been prototypically tested in Betz et al (2018), which did not yet take into account the structural interplay of silences and fillers. Furthermore, the hesitation model by Betz et al (2018) is centered on lengthening, which provides an elegant entry point for a synthetic hesitation interval, and reflects human speech production by making use of the pre-planned, but not-yet-uttered words in the articulatory buffer (Levelt, 1989). This approach receives support by the confirmed notion of longer silences after um-type fillers: the presence of a nasal sound makes this type of filler a better candidate to smoothly initiate a hesitation interval by lengthening compared to the uh-type fillers (for hesitation lengthening distribution over phone types, cf.…”
Section: Application and Outlookmentioning
confidence: 99%
“…This insight is mainstream for related domains such as the evaluation of dialogue systems, where perceived system quality cannot be meaningfully assessed in a decontextualized fashion [3,4]. First evidence supporting this claim also for the domain of TTS evaluation has been produced by [5], who show that the same TTS material is rated differently in a crowdsourced, noninteractive MOS rating, and an MOS rating following an interaction between a human and a virtual agent in a collaborative task. Despite these insights, a meta analysis [6] revealed that the vast majority of TTS evaluations remain to rely on decontextualized listening tests, where participants score the quality of isolated sentences rather than embedding them within realistic applications or meaningful interactions.…”
Section: Introductionmentioning
confidence: 98%
“…In a similar vein, recent times have seen an increasing number of papers criticizing traditional approaches to TTS evaluation [7,5], or pointing out frequent methodological flaws such as the low validity of most TTS evaluations due to small participant numbers and a lack of diversity in the tested listener groups, especially in the light of vast individual differences between listeners [8,9], which shows stronger for some traits (age, human-likeness) than others (gender, accent origin) [10]. Generally, these investigations point out the necessity for a better conceptual framing of the perception tasks, together with larger test populations and more careful statistical approaches.…”
Section: Introductionmentioning
confidence: 99%
“…One practical application of the results obtained here is the extension of the hesitation insertion model for speech synthesis, which has been prototypically tested in Betz et al (2018), which did not yet take into account the structural interplay of silences and fillers. Furthermore, the hesitation model by Betz et al (2018) is centered on lengthening, which provides an elegant entry point for a synthetic hesitation interval, and reflects human speech production by making use of the pre-planned, but not-yet-uttered words in the articulatory buffer .…”
Section: Application and Outlookmentioning
confidence: 99%
“…One practical application of the results obtained here is the extension of the hesitation insertion model for speech synthesis, which has been prototypically tested in Betz et al (2018), which did not yet take into account the structural interplay of silences and fillers. Furthermore, the hesitation model by Betz et al (2018) is centered on lengthening, which provides an elegant entry point for a synthetic hesitation interval, and reflects human speech production by making use of the pre-planned, but not-yet-uttered words in the articulatory buffer . This approach receives support by the confirmed notion of longer silences after um-type fillers: the presence of a nasal sound makes this type of filler a better candidate to smoothly initiate a hesitation interval by lengthening compared to the uh-type fillers (for hesitation lengthening distribution over phone types, cf.…”
Section: Application and Outlookmentioning
confidence: 99%