In this paper we have described the methodologies that we have used in data collection and recording for our Hindi Text to Speech system. Design of the speech corpus plays a very important role in overall quality of the text-to-speech system. A huge text corpus of one million words was created for existing text-to-speech system. We have crawled text from many domains like financial, government, current news etc. along with pre-built dictionaries. For the very first time, we have also generated and incorporated text from Hindi Short-Messaging-Service (SMS). The efforts were made to make the generic speech corpus for Hindi. The crawled text was first filtered for correctness e.g. spelling mistakes, validity to Hindi, word lengths etc. The filtered words were then carefully analyzed and ensured that phonetically balanced text is prepared. This cured text is then recorded by professional recordist in a studio environment. The recorded speech data is then processed and annotated to generate the final speech corpus. The paper explains the speech corpus creation process, beginning with text data crawling, filtering, recording and annotation phases. The final speech corpus thus generated is used in the Hindi Text-to-Speech system with the MOS of 2.8.
Schwa deletion is important factor for conversion of Grapheme to Phoneme. In Hindi language each consonant has weak vowel. This weak vowel is called as inherent schwa. These schwa is deleted some cases in pronunciation. Written form and speech forms are different in Indian language. Schwa plays important role in speech form. Deletion and retention of weak vowel decides how words are pronounced. Words morphology is main factors that affects pronunciation. In current paper, we describe schwa handling, deletion and retention rules. Based on different rule we developed schwa deletion algorithm. This algorithm has been tested over 6000 high frequency words. We received accuracy result up to 80%. Based on result an application has been developed to provide user interface for the text processing component of text to speech system
In this paper, we discuss process of design and development of talking ATM for visually impaired people. Automated Teller Machine (ATM) has become vital part of our life to perform financial transactions without intervention of human banker. ATM facilitates cash withdrawal, balance check, mini statement and fund transfer. But, these banking services using ATM cannot be directly used by some set of people of society such as people with low vision, visually impaired, illiterate as lack of accessing ATM through screens. Even they can be defrauded at ATM centers. To digitally include these set of people, talking ATMs are evolved. Talking ATM provides accessibility to ATM services by providing audio component. Many ATMs employ headphone jack that facilitates user to do transaction with security. The audio information is generated either using pre-recorded speech corpus or through speech synthesis engine. The paper summarizes how ATM works, need, proposed solution of talking ATM for visually impaired users, design and development talking ATM using concatenated Text To Speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.