Hybrid statistical pronunciation models designed to be trained by a medium-size corpus

Vazirnezhad, Bahram; Almasganj, Farshad; Ahadi, Seyed Mohammad

doi:10.1016/j.csl.2008.02.001

Cited by 10 publications

(9 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It can also be highlighted that syllable-based features are in the top of the list. These conclusions are consistent with previous studies [18,3,1].…”

Section: Linguistic Feature Selectionsupporting

confidence: 83%

“…[2][3][4], while linguistic features can be derived from textual data (distinction between content and function words, word predictability, syllable locations, lexical stress, etc.) [18,3,4]. Recently, [6] presented a deep study on the combination of both types of features, including even others like age and gender.…”

Section: Introductionmentioning

confidence: 99%

“…While early work has mostly concentrated on using phonological rules extracted from data to create alternative pronunciations [17,8], most recent techniques are machine learning approaches. Notably, decision trees [7,18], random forests [6], neural networks [5,10], hidden Markov models [16], and CRFs [10] have been investigated. In [18], decision trees and statistical contextual rules are even combined.…”

Section: Introductionmentioning

confidence: 99%

“…Notably, decision trees [7,18], random forests [6], neural networks [5,10], hidden Markov models [16], and CRFs [10] have been investigated. In [18], decision trees and statistical contextual rules are even combined. Alternatively, [11] proposed to produce accented pronunciations by interpolating different grapheme-to-phoneme models.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features

Qader

Lecorvé

Lolive

et al. 2015

Statistical Language and Speech Processing

View full text Add to dashboard Cite

Abstract. Pronunciation adaptation consists in predicting pronunciation variants of words and utterances based on their standard pronunciation and a target style. This is a key issue in text-to-speech as those variants bring expressiveness to synthetic speech, especially when considering a spontaneous style. This paper presents a new pronunciation adaptation method which adapts standard pronunciations to the style of individual speakers in a context of spontaneous speech. Its originality and strength are to solely rely on linguistic features and to consider a probabilistic machine learning framework, namely conditional random fields, to produce the adapted pronunciations. Features are first selected in a series of experiments, then combined to produce the final adaptation method. Backend experiments on the Buckeye conversational English speech corpus show that adapted pronunciations significantly better reflect spontaneous speech than standard ones, and that even better could be achieved if considering alternative predictions.

show abstract

“…It can also be highlighted that syllable-based features are in the top of the list. These conclusions are consistent with previous studies [18,3,1].…”

Section: Linguistic Feature Selectionsupporting

confidence: 83%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features

Qader

Lecorvé

Lolive

et al. 2015

Statistical Language and Speech Processing

View full text Add to dashboard Cite

show abstract

“…They were shown to be relevant for pronunciation modelling with decision trees [32] or Bayesian networks [33]. Linguistic, phonological and articulatory features can be directly derived from textual data, such as distinction between content and function words, word predictability or syllable locations [34], [35], [36]. Syllable-based features, among them schwas and liaisons, have also been investigated for pronunciation variants in French [37], [38].…”

Section: Studies On Pronunciation Variants Modellingmentioning

confidence: 99%

Can We Generate Emotional Pronunciations for Expressive Speech Synthesis?

Tahon

Lecorvé

Lolive

2020

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Abstract-In the field of expressive speech synthesis, a lot of work has been conducted on suprasegmental prosodic features while few has been done on pronunciation variants. However, prosody is highly related to the sequence of phonemes to be expressed. This article raises two issues in the generation of emotional pronunciations for TTS systems. The first issue consists in designing an automatic pronunciation generation method from text, while the second issue addresses the very existence of emotional pronunciations through experiments conducted on emotional speech. To do so, an innovative pronunciation adaptation method which automatically adapts canonical phonemes first to those labeled in the corpus used to create a synthetic voice, then to those labeled in an expressive corpus, is presented. This method consists in training conditional random fields pronunciation models with prosodic, linguistic, phonological and articulatory features. The analysis of emotional pronunciations reveals strong dependencies between prosody and phoneme assimilation or elisions. According to perceptual tests, the double adaptation allows to synthesize expressive speech samples of good quality, but emotion-specific pronunciations are too subtle to be perceived by testers.

show abstract