Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

Baby, Arun; Prakash, Jeena J.; Vignesh, S.; Murthy, Hema A.

doi:10.21437/interspeech.2017-666

Cited by 9 publications

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Figure 3, we represent the research work related to audio segmentation that started majorly in 2005. Until a decade, there was a very slow increase in this type of research, but post-2016, there was a sharp rise in this area of research [17]. e exponential increase in research trends can be seen in audio segmentation-related research since 2017.…”

Section: Research Trends In Web Of Science Database For Audio Segment...mentioning

confidence: 99%

Audio Segmentation Techniques and Applications Based on Deep Learning

Aggarwal

Vasukidevi²,

Selvakanmani³

et al. 2022

Scientific Programming

View full text Add to dashboard Cite

Audio processing has become an inseparable part of modern applications in domains ranging from health care to speech-controlled devices. In automated audio segmentation, deep learning plays a vital role. In this article, we are discussing audio segmentation based on deep learning. Audio segmentation divides the digital audio signal into a sequence of segments or frames and then classifies these into various classes such as speech recognition, music, or noise. Segmentation plays an important role in audio signal processing. The most important aspect is to secure a large amount of high-quality data when training a deep learning network. In this study, various application areas, citation records, documents published year-wise, and source-wise analysis are computed using Scopus and Web of Science (WoS) databases. The analysis presented in this paper supports and establishes the significance of the deep learning techniques in audio segmentation.

show abstract

Section: Research Trends In Web Of Science Database For Audio Segment...mentioning

confidence: 99%

Audio Segmentation Techniques and Applications Based on Deep Learning

Aggarwal

Vasukidevi²,

Selvakanmani³

et al. 2022

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…With syllable and phone labels, phone alignments of speech wavefiles are obtained using signal processing cues in tandem with deep learning techniques [13]. First, syllable level boundaries are determined with the help of spectral cues [26].…”

Section: Segmenting Speech Datamentioning

confidence: 99%

Code-switching in Indic Speech Synthesisers

et al. 2018

Self Cite

View full text Add to dashboard Cite

Most Indians are inherently bilingual or multilingual owing to the diverse linguistic culture in India. As a result, code-switching is quite common in conversational speech. The objective of this work is to train good quality text-to-speech (TTS) synthesisers that can seamlessly handle code-switching. To achieve this, bilingual TTSes that are capable of handling phonotactic variations across languages are trained using combinations of monolingual data in a unified framework. In addition to segmenting Indic speech data using signal processing cues in tandem with hidden Markov model-deep neural network (HMM-DNN), we propose to segment Indian English data using the same approach after NIST syllabification. Then, bilingual HTS-STRAIGHT based systems are trained by randomizing the order of data so that the systematic interactions between the two languages are captured better. Experiments are conducted by considering three language pairs: Hindi+English, Tamil+English and Hindi+Tamil. The code-switched systems are evaluated on monolingual, code-mixed and code-switched texts. Degradation mean opinion score (DMOS) for monolingual sentences shows marginal degradation over that of an equivalent monolingual TTS system, while the DMOS for bilingual sentences is significantly better than that of the corresponding monolingual TTS systems.

show abstract