2016
DOI: 10.1016/j.specom.2016.03.003
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework

Abstract: One of the primary steps in building automatic speech recognition (ASR) as well as text-to-speech systems is development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this process, grapheme-to-phoneme conversion (G2P) techniques are used in which given an initial phoneme lexicon, the relationshi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 28 publications
0
11
0
Order By: Relevance
“…An accurate measurement of brevity would require detailed acoustical information that is missing in raw written transcripts [10] or using more sophisticated methods of computation, for instance, to calculate number of phonemes and syllables according to [1]. However, the relationship between the duration of phonemes and graphemes is well-known and in general longer words has longer durations: grapheme-to-phoneme conversion is still a hot topic of research, due to the ambiguity of graphemes with respect to their pronunciation that today supposes a difficulty in speech technologies [18]. In order to improve the frequency measure, we would consider the use of alternative databases, e.g., the frequency of English words in Wikipedia [11].…”
Section: Resultsmentioning
confidence: 99%
“…An accurate measurement of brevity would require detailed acoustical information that is missing in raw written transcripts [10] or using more sophisticated methods of computation, for instance, to calculate number of phonemes and syllables according to [1]. However, the relationship between the duration of phonemes and graphemes is well-known and in general longer words has longer durations: grapheme-to-phoneme conversion is still a hot topic of research, due to the ambiguity of graphemes with respect to their pronunciation that today supposes a difficulty in speech technologies [18]. In order to improve the frequency measure, we would consider the use of alternative databases, e.g., the frequency of English words in Wikipedia [11].…”
Section: Resultsmentioning
confidence: 99%
“…In both cases though the performance is statistically comparable. This trend is more attributed to the fact that acoustic G2P conversion approach typically leads to acoustically confusable substitutions [20], which a discriminative acoustic model (ANN) seems to handle better than a generative acoustic model (GMM). Finally, the best performance of 93.1%…”
Section: Results and Analysismentioning
confidence: 99%
“…In case of DTs, the estimates are Kronecker delta distributions [20], as DTs map a central grapheme with contextual information deterministically onto a phoneme.…”
Section: Multi-stream Combinationmentioning
confidence: 99%
See 2 more Smart Citations