2002
DOI: 10.1006/csla.2002.0190
|View full text |Cite
|
Sign up to set email alerts
|

Selection of the most significant parameters for duration modelling in a Spanish text-to-speech system using neural networks

Abstract: Accurate prediction of segmental duration from text in a text-to-speech system is difficult for several reasons. One which is especially relevant is the great quantity of contextual factors that affect timing and it is difficult to find the right way to model them. There are many parameters that affect duration, but not all of them are always relevant and some can even be counterproductive because of the possibility of overtraining.The main motivation of this paper has been to reduce the error in the duration … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0
3

Year Published

2007
2007
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 18 publications
0
8
0
3
Order By: Relevance
“…The speech synthesis module is a male voice text to speech system developed in by GTH-UPM (BORIS [3]). This module uses a diaphone unit concatenating algorithm, able to modify the speaking rate and speaker pitch.…”
Section: Response Generation and Speech Synthesismentioning
confidence: 99%
“…The speech synthesis module is a male voice text to speech system developed in by GTH-UPM (BORIS [3]). This module uses a diaphone unit concatenating algorithm, able to modify the speaking rate and speaker pitch.…”
Section: Response Generation and Speech Synthesismentioning
confidence: 99%
“…Several models based on neural network principles are described in the literature for predicting the durations of syllables in continuous speech (Campbell 1990;Campbell & Isard 1991;Campbell 1992Campbell , 1993Barbosa & Bailly 1994Cordoba et al 1999;Hifny & Rashwan 2002;Sonntag et al 1997;Teixeira & Freitas 2003). Campbell (1992) used a feedforward neural network trained with feature vectors, each representing six features of a syllable.…”
Section: Input Layermentioning
confidence: 99%
“…Duration modeling techniques can be divided into two major categories, the rule-based approaches [3,4,5] and the corpus-based [6,7,8,9]. With regard to rule-based approaches the most well known and widely used is the one proposed by Dennis Klatt [3].…”
Section: Introductionmentioning
confidence: 99%
“…Concerning machine learning based duration models the need of a large annotated speech corpus is mandatory. In previous research such models have been constructed with the application of machine learning approaches such as decision trees [6], Artificial Neural Networks [7], Bayesian models [9] or linear statistical models [8].…”
Section: Introductionmentioning
confidence: 99%