Urdu language uses cursive script which results in connected characters constituting ligatures. For identifying characters within ligatures of different scales (font sizes), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) Network are used. Both network models are trained on formerly extracted ligature thickness graphs, from which models extract Meta features. These thickness graphs provide consistent information across different font sizes. LSTM and CNN are also trained on raw images to compare performance on both forms of inputs. For this research, two corpora, i.e. Urdu Printed Text Images (UPTI) and Centre for Language Engineering (CLE) Text Images are used. Overall performance of networks ranges between 90% and 99.8%. Average accuracy on Meta features is 98.08% while using raw images, 97.07% average accuracy is achieved.
A number of cross-sectional, acoustic studies have found that young children's speech segments tend to be longer and more variable than those of older children and adults. However, very little longitudinal information of this nature is available that considers changes across time for individual children. The present investigation is a longitudinal analysis of several temporal characteristics of the speech of 12 children of various ages who were each seen twice, approximately 1 1/2 years apart. For the group, durations decreased on average from the initial to the follow-up recordings by approximately 10%, and temporal variability decreased by about 40%. For the individual children, however, it was found that some of them showed few, if any, changes in some of the temporal measurements made at the two different times, whereas others showed substantial differences. Younger children also did not necessarily show longer durations or greater variability than older children, nor did younger children always show greater changes across time than older children. Thus, although cross-sectional studies indicate that there is a general tendency when comparing groups for increased age to be associated with shorter durations and reduced variability, individual children may not evidence such patterns or changes across time.
Urdu is spoken by more than 100 million people across a score countries and is the national language of Pakistan (http://www. ethnologue.com). There is a great need for developing a text-to-speech system for Urdu because this population has low literacy rate and therefore speech interface would greatly assist in providing them access to information. One of the significant parts of a text-to-speech system is a natural language processor which takes textual input and converts it into an annotated phonetic string. To enable this, it is necessary to develop models which map textual input onto phonetic content. These models may be very complex for various languages having unpredictable behaviour (e.g. English), but Urdu shows a relatively regular behaviour and thus Urdu pronunciation may be modelled from Urdu text by defining fairly regular rules. These rules have been identified and explained in this paper.
Both Inflectional and derivational morphology lead to multiple surface forms of a word. Stemming reduces these forms back to its stem or root, and is a very useful tool for many applications. There has not been any work reported on Urdu stemming. The current work develops an Urdu stemmer or Assas-Band and improves the performance using more precise affix based exception lists, instead of the conventional lexical lookup employed for developing stemmers in other languages. Testing shows an accuracy of 91.2%. Further enhancements are also suggested.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.