Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-666
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

Abstract: Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model-hidden Markov model (GMM-HMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work, we had proposed the use of signal processing cues in tandem with GMM-HMM based forced alignment for boundary correction for building Indian language TTS systems. In this pap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…In Figure 3, we represent the research work related to audio segmentation that started majorly in 2005. Until a decade, there was a very slow increase in this type of research, but post-2016, there was a sharp rise in this area of research [17]. e exponential increase in research trends can be seen in audio segmentation-related research since 2017.…”
Section: Research Trends In Web Of Science Database For Audio Segment...mentioning
confidence: 99%
“…In Figure 3, we represent the research work related to audio segmentation that started majorly in 2005. Until a decade, there was a very slow increase in this type of research, but post-2016, there was a sharp rise in this area of research [17]. e exponential increase in research trends can be seen in audio segmentation-related research since 2017.…”
Section: Research Trends In Web Of Science Database For Audio Segment...mentioning
confidence: 99%
“…With syllable and phone labels, phone alignments of speech wavefiles are obtained using signal processing cues in tandem with deep learning techniques [13]. First, syllable level boundaries are determined with the help of spectral cues [26].…”
Section: Segmenting Speech Datamentioning
confidence: 99%