Progress in Speech Synthesis 1997
DOI: 10.1007/978-1-4612-1894-4_25
|View full text |Cite
|
Sign up to set email alerts
|

The Aligner: Text-to-Speech Alignment Using Markov Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0
4

Year Published

2004
2004
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(27 citation statements)
references
References 7 publications
0
23
0
4
Order By: Relevance
“…Any site which uses ToBI labelled data to train an automatic speech recognition or speech synthesis system does this, and the emerging convention is to call such a projection a Phones tier. At many such sites, a firstpass Phones-tier labelling is done automatically using an alignment program, such as Aligner (Wightman and Talkin 1994) or some other similar HMM-based automatic transcription alignment system. For such sites, the Words-tier labels are then also derived automatically from the Phones alignment.…”
Section: Extensions Of Tobimentioning
confidence: 99%
“…Any site which uses ToBI labelled data to train an automatic speech recognition or speech synthesis system does this, and the emerging convention is to call such a projection a Phones tier. At many such sites, a firstpass Phones-tier labelling is done automatically using an alignment program, such as Aligner (Wightman and Talkin 1994) or some other similar HMM-based automatic transcription alignment system. For such sites, the Words-tier labels are then also derived automatically from the Phones alignment.…”
Section: Extensions Of Tobimentioning
confidence: 99%
“…In this context, there have been based on data -driven text analysis methods at home and abroad [9,10]. For example, using hidden a Markov Model (HMM: Hidden Markov Model) and neural networks method (Neural Network Method ) [ 11,12 ].…”
Section: Figure 2 Speech Synthesis Methodsmentioning
confidence: 99%
“…TIMIT (Garofolo 1988) is the most widely used corpus for phone segmentation, and has been established for this task (Brugnara et al 1993;Wightman and Talkin 1997;Pellom and Hansen 1998;Aversano et al 2001;Keshet et al 2007). In brief, it consists of microphone quality recordings of 630 speakers of the 8 major American-English dialects, with sampling frequency 16 kHz and resolution of 16 bits per sample.…”
Section: Evaluation Databasementioning
confidence: 99%