2013
DOI: 10.5120/10053-4909
|View full text |Cite
|
Sign up to set email alerts
|

Nepali Text to Speech Synthesis System using ESNOLA Method of Concatenation

Abstract: This paper confer the tools and methodology used in developing a Nepali Text to Speech Synthesis System, which is based on concatenative approach employing Epoch Synchronous Non Overlap Add Method (ESNOLA), which uses signal dictionary having raw sound signal representing parts of phonemes as a speech database. The developed system is an unintonated (flat) TTS system where the pitch of the pre-recorded speech signal remains same throughout, while taking care of aspects such as naturalness, personality, platfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 3 publications
0
2
0
Order By: Relevance
“…According to [15], "The vocalized form of human communication is termed as audio, each of our spoken word is created out of phonetic combination of a limited set of vowel and consonant audio, which are the sound units in audio synthesis" Even speaking the exact same word(s), with different speed, loudness, pitch, and accent, including both cultural and age-related differences, may lead to different results.…”
Section: Audio Based Feature Extractionmentioning
confidence: 99%
“…According to [15], "The vocalized form of human communication is termed as audio, each of our spoken word is created out of phonetic combination of a limited set of vowel and consonant audio, which are the sound units in audio synthesis" Even speaking the exact same word(s), with different speed, loudness, pitch, and accent, including both cultural and age-related differences, may lead to different results.…”
Section: Audio Based Feature Extractionmentioning
confidence: 99%
“…According to [15], "The vocalized form of human communication is termed as audio, each of our spoken word is created out of phonetic combination of a limited set of vowel and consonant audio, which are the sound units in audio synthesis" Even speaking the exact same word(s), with different speed, loudness, pitch, and accent, including both cultural and age-related differences, may lead to different results.…”
Section: Audio Based Feature Extractionmentioning
confidence: 99%