2015 International Conference on Signal Processing and Communication Engineering Systems 2015
DOI: 10.1109/spaces.2015.7058279
|View full text |Cite
|
Sign up to set email alerts
|

Methodology for designing and creating Hindi speech corpus

Abstract: In this paper we have described the methodologies that we have used in data collection and recording for our Hindi Text to Speech system. Design of the speech corpus plays a very important role in overall quality of the text-to-speech system. A huge text corpus of one million words was created for existing text-to-speech system. We have crawled text from many domains like financial, government, current news etc. along with pre-built dictionaries. For the very first time, we have also generated and incorporated… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 3 publications
0
3
0
Order By: Relevance
“…According to [11] the major challenges when it comes to speech corpus creation is segmenting audio data into sentences. [12] presented the methodology the used for designing and creating Hindi speech corpus. The methodology involved crawling text, filtering, recording and annotation phases.…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
“…According to [11] the major challenges when it comes to speech corpus creation is segmenting audio data into sentences. [12] presented the methodology the used for designing and creating Hindi speech corpus. The methodology involved crawling text, filtering, recording and annotation phases.…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
“…Recordings were collected from old people with the help of a microphone [36]. A comprehensive study was carried out on Hindi [37]. In this study, recordings were captured in a studio environment.…”
Section: Related Workmentioning
confidence: 99%
“…If all these words are processed without filtering, then there may be chance that irrelevant words are part of the corpus recording script, and those words are not adding value to the system. Also invalid words are very difficult to pronounce so filtering is very important step [13]. As text corpus size is very huge so manual checking is difficult and time consuming so algorithm are developed to create phonetically balanced optimized words [8] [9].…”
Section: Speech Corpus Design and Developmentmentioning
confidence: 99%