Text to speech synthesis (TTS) is the production of artificial speech by a machine for the given text as input. The speech synthesis can be achieved by concatenation and Hidden Markov Model techniques. The voice synthesized by these techniques should be evaluated for quality. The study extends towards the comparative analysis for quality of speech synthesis using hidden markov model and unit selection approach. The quality of synthesized speech is analyzed for subjective measurement using mean opinion score and objective measurement based on mean square score and peak signal-to-noise ratio (PSNR). The quality is also accessed by Mel-frequency cepstral coefficient features for synthesized speech. The experimental analysis shows that unit selection method results in better synthesized voice than hidden markov model.
The research paper briefs about the implementation of screen readers for Marathi in Windows and Linux platform using unrestricted domain Marathi Text To Speech with Indian English support. The application is an integration of MTTS with open source Screen readers NVDA and ORCA. MTTS is a syllable based unit selection concatenative system, built around open source festival engine. IE support is provided for the smooth navigation and handling the English words occurring while accessing internet and other applications. The TTS is a concatenative based system in which syllable is the highest unit for concatenation. The TTS output resembles natural human voice since it uses the original speech segments for concatenation. Testing has been done with normal and differently abled users. Tuning of the system for improving the user friendliness has been done based on the feedback from the DA The system gets a Mean Opinion Score of 86.4% when evaluated by a group of DA.
A Text To Speech synthesis (TTS) is the production of artificial speech by a machine for the given text as input. This field of study is known both as Speech Synthesis that is the "synthetic" (computer) generation of speech, and Text-ToSpeech or TTS. It is the process of converting written text into speech. In the process of speech synthesis, mainly two processing components are used; they are NLP (natural language processing) and DSP (digital signal processing) modules. The speech synthesis has enormous applications such as reading for blind people, telecommunication services, language education, and aid to handicapped persons, talking books and toys, call center automation etc. The main aim of the project is to develop a TTS system producing a voice with Indian accent for the given input text. In this project, for the conversion of text to speech, we use Festival in Linux environment. Festival is a general pre-packaged tool for development of multi-language speech synthesis systems; and it will support most of the languages in the text to speech conversion. In this project, the speech generation process is done by using Festival frame work and speech tools. The voice model is generated by using festvox frame work, festival and speech tools. The required speech data for generating voice is recorded in noise less environment. The voice models can be generated by unit selection or clustergen modules present in festvox. It is observed from the generated voices that clustergen voices are better than unit selection voices.
A challenge in computer vision known mosquito classification hasn't gained much traction. Automatic mosquito species credentials using real-time images is a crucial feature. Mosquitoes are a serious matter of concern since they can spread diseases including dengue fever, zika, and malaria. It's important to control mosquito populations in order to effectively control mosquitoes. The World Health Organization reported that over a million people worldwide experience malaria and dengue fever each year. In this investigation, we analyze a deep learning vgg-16 network architecture for mosquito specifically chosen. On our mosquito dataset, which included six (6) species of mosquito. The pretrained vgg-16 network architecture with transfer learning technique was studied and proved to identify distinct mosquito species, with an average accuracy rate of 97.1751 percent Loss 0.094359393954277. The results of VGG 16 and CNN are compared. The results show that CNN with multi class classifier is achieving 85.75 percent accuracy and VGG 16 with 97.1751 accuracy. It shows that the VGG 16 model is pretty good in results as compare to CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.