Prosody refers to structure of sound and rhythm and both are essential parts of speech processing applications. It comprises of tone, stress, intonation and rhythm. Pitch and duration are the core elements of acoustic and that information can make easy to design and development for application module. Through these two peculiarities, the prosody module can be validated. These two factors have been investigated using the sounds of Sindhi adults and presented in this paper. For the experiment and analysis, 245 male and female undergraduate students were selected as speakers belonging from five different districts of upper Sindh and categorized into groups according to their age. Particular sentences were given and recorded individually from the speakers. Afterward, these sentences segmented into words and stored in a database consisting of 1960 sounds. Thus, distance of the frequency in pitch was measured via Standard Deviation (SD). The lowest Mean SD accompanied 0.25Hz and 0.28Hz received from male and female group of district Sukkur. The highest Mean SD has measured with male and female group of district Ghotki along 0.42Hz and 0.49Hz. Generally, the pitch of female's speakers was found high in contrast to male's speaker by 0.072Hz variation.
Analysis and synthesis of speech to be automated still require more research efforts in general and for the development of speech processing applications based on Arabic Script like Sindh Textto-Speech in particular. To achieve the required results from the speech processing applications prosodic features must be exercised extremely as the prosody is highly linked with the information of sounds having different characteristics like linguistic rules, complications and variations of expressions. Objectives: This study aims to generate and analyze the prosodic information specifically pitch and duration from the recorded Sindhi sounds using the back propagation neural network. Methods: Two methods are used to obtain the prosodic information of Sindhi sounds, PRAAT speech analyser is used to obtain the results and for the validation a back propagation neural network model is implemented. From the four districts of Sindh 228 speakers were chosen and the sound of different descriptive sentences was recorded for the experiments. Finding: After the experiments with a neural network model with multiple layers on the collected sound, 98.8% a highly acceptable level of accuracy achieved at the 18 th epoch among the 100 epochs. Application and improvements: The generated Sindhi prosodic information and adopted research methodology will be supportive to the scholars of Sindhi speech processing applications. This research work can be considered as the first step as no work for generating Sindhi prosody is found yet.
Speech signal analysis for the extraction of speech elements is viable in natural language applications. Rhythm, intonation, stress, and tone are the elements of prosody. These features are essential in emotional speech, speech to speech, speech recognition, and other applications. The current study attempts to extract the pitch and duration from historical Sindhi sound clips using the functional contours model's superposition. The sampled sound clips contained the speech of 273 undergraduates living in 5 districts of the Sindhi province. Several Python libraries are available for the application of this model. We used these libraries for the extraction of prosodic data from a variety of sound units. The spoken sentences were categorically segmented into words, syllables, and phonemes. A speech analyzer investigated the acoustics of sounds with the power spectral density method. Meanwhile, a speech database was divided into parts contains words of different sizes (ranging from 1-letter to 5-letter words). The results illustrated the production of both minimum and maximum μ sound durations and pitches from the inhabitants of Khairpur and Ghotki districts, respectively. Both districts lie in the upper part of the Sindh province. In addition, the second parameter approach, observed versus obtained, was used to compare outcomes. We observed 5250 and 4850 durations and pitches, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.