The Aligner: Text-to-Speech Alignment Using Markov Models

Wightman, Colin W.; Talkin, David

doi:10.1007/978-1-4612-1894-4_25

Cited by 34 publications

(27 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Any site which uses ToBI labelled data to train an automatic speech recognition or speech synthesis system does this, and the emerging convention is to call such a projection a Phones tier. At many such sites, a firstpass Phones-tier labelling is done automatically using an alignment program, such as Aligner (Wightman and Talkin 1994) or some other similar HMM-based automatic transcription alignment system. For such sites, the Words-tier labels are then also derived automatically from the Phones alignment.…”

Section: Extensions Of Tobimentioning

confidence: 99%

The Original ToBi System and the Evolution of the ToBi Framework

Beckman¹,

Hirschberg²,

2005

View full text Add to dashboard Cite

This chapter presents an overview of the original ToBI system. It reviews the design of the original ToBI system and its foundations in basic and applied research. It describes the inter-disciplinary community of users and uses for which the system was intended, and it outlines how the consensus model of American English intonation and inter-word juncture was achieved by finding points of useful intersection among the research interests and knowledge embodied in this community. It thus identifies the practical principles for designing prosodic annotation conventions that emerged in the course of developing, testing, and using this particular system. The chapter also describes how the original ToBI conventions have been evolved to be the general annotation conventions for several other English varieties and for a number of other languages.

show abstract

Section: Extensions Of Tobimentioning

confidence: 99%

The Original ToBi System and the Evolution of the ToBi Framework

Beckman¹,

Hirschberg²,

2005

View full text Add to dashboard Cite

show abstract

“…In this context, there have been based on data -driven text analysis methods at home and abroad [9,10]. For example, using hidden a Markov Model (HMM: Hidden Markov Model) and neural networks method (Neural Network Method ) [ 11,12 ].…”

Section: Figure 2 Speech Synthesis Methodsmentioning

confidence: 99%

Analysis of Tibetan-language Speech Technology

Bai¹,

Tao²,

Wu³

et al. 2017

Proceedings of the 2017 7th International Conference on Education, Management, Computer and Society (EMCS 2017)

View full text Add to dashboard Cite

Abstract. This paper studies the speech technology (Speech Recognition and Text To Speech) for Tibetan. Recognition of Tibetan characters is a significant module of multi-language information processing system in China. Speech is the most convenient and natural way of communication.Owing to the special structure of Tibetan characters, the SR and TTS of traditional Tibetan characters face problems of low recognition rates and poor recognition effects. Through an in-depth study on the features of Tibetan characters, we finished this paper. This paper briefly introduces the development and the basic principles of speech technology, and then we provide an analysis of SR and TTS of Tibetan. We give the structure diagrams of the SR and the TTS technologies and introduction of the key module. Finally, we discuss the prospects of application in academic writing, demands and challenges of the Tibetan speech technology briefly in many areas.

show abstract

“…TIMIT (Garofolo 1988) is the most widely used corpus for phone segmentation, and has been established for this task (Brugnara et al 1993;Wightman and Talkin 1997;Pellom and Hansen 1998;Aversano et al 2001;Keshet et al 2007). In brief, it consists of microphone quality recordings of 630 speakers of the 8 major American-English dialects, with sampling frequency 16 kHz and resolution of 16 bits per sample.…”

Section: Evaluation Databasementioning

confidence: 99%

Phonetic segmentation using multiple speech features

Mporas

Ganchev

Fakotakis

2008

Int J Speech Technol

View full text Add to dashboard Cite

In this paper we propose a method for improving the performance of the segmentation of speech waveforms to phonetic units. The proposed method is based on the well known Viterbi time-alignment algorithm and utilizes the phonetic boundary predictions from multiple speech parameterization techniques. Specifically, we utilize the most appropriate, with respect to boundary type, phone transition position prediction as initial point to start Viterbi time-alignment for the prediction of the successor phonetic boundary. The proposed method was evaluated on the TIMIT database, with the exploitation of several, well known in the area of speech processing, Fourier-based and wavelet-based speech parameterization algorithms. The experimental results for the tolerance of 20 milliseconds indicated an improvement of the absolute segmentation accuracy of approximately 0.70%, when compared to the baseline speech segmentation scheme.

show abstract

The Aligner: Text-to-Speech Alignment Using Markov Models

Cited by 34 publications

References 7 publications

The Original ToBi System and the Evolution of the ToBi Framework

The Original ToBi System and the Evolution of the ToBi Framework

Analysis of Tibetan-language Speech Technology

Phonetic segmentation using multiple speech features

Contact Info

Product

Resources

About