Sindhi Part of Speech Tagging System Using Wordnet

Mahar, Javed Ahmed; Memon, Ghulam Qadir

doi:10.7763/ijcte.2010.v2.198

Cited by 12 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The phonological systems of other Indo-Aryan languages are resembled mostly. In Sindhi language, there are 10 vowels and 43 consonants phonemes are unique [6]. When a person speaks in microphone are on telephone, the speech acquisition starts.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Challenges of Accent and vowels for Sindhi Speech Recognition System

2021

IJATCSE

View full text Add to dashboard Cite

While talking and writing in Sindhi language, many challenges are faced because of the large number of 52 characters or alphabets. Vowels and the accent keep changing in fluency of speaking and writing. Due to the different varities of languages in the world and the dearth of computer scientists in the field of Speech Recognition, it is considered difficult area of study and is the least advanced field of Artificial Intelligence. More specifically, the difficulties are faced in the speech recognition for languages like Arabic and its adapting languages such as Sindhi, Pashto, Urdu, and others. The script and sounds in every language are directly proportional to each other i.e. the shorter script has less sounds while the longer script has more sounds. We developed a system for speech to text recognition system for Sindhi language with the help of Sphinx model. We have also tested the different datasets through the input in various phases and compare the results and accuracy of the vowels and accents through the proposed system

show abstract

Section: Methodsmentioning

confidence: 99%

“…To develop systems and techniques for speech input to machine the main aim is the speech recognition area. The spoken words or samples are needed to be collected from various regions and areas so that the different accents can be used on the basis of environment [6].…”

Section: Introductionmentioning

confidence: 99%

Challenges of Accent and vowels for Sindhi Speech Recognition System

2021

IJATCSE

View full text Add to dashboard Cite

show abstract

“…The results have been presented by applying WordNet and without WordNet and an overall accuracy has been reported as 96.28% without net and 97.14 with word net. The results have been presented with training, testing corpus and unknown words [3]. A morphological analyzer is proposed for Sindhi language by [4].…”

Section: Pos Tagging For Various Regional Languages Of Pakistan Sindh...mentioning

confidence: 99%

“…A study presented in [3] applied POS tagging for Sindhi language. The study highlighted the characteristics of Sindhi languages pertaining to POS tagging system such as the lexical and morphological ambiguity.…”

Section: Sindhi Pos Tagging Systemmentioning

confidence: 99%

Analysis and Comparative Study of POS Tagging Techniques for National (Urdu) Language and other Regional Languages of Pakistan

Rajper¹,

Rajper²,

Maitlo³

et al. 2021

Sindh Univ. Res. Jour

View full text Add to dashboard Cite

Defining algorithms and techniques to enable computers to understand human language is the Natural Language Processing (NLP), which is an integral part of speech recognition. Parts of Speech (POS) is considered as one of the well understood problems of Natural Language Processing, in which natural language words and sentence are tagged or assigned grammatical classes, because tagging a single word by human hand is a time consuming and tedious job. To automate the tagging job is the way to automate the lexicons of the text of a language. Many of the languages are enriched with their POS tagging systems. Pakistani regional languages are less developed due to the many reasons and much of the work is needed in POS tagging system. Some of the regional languages have their POS tagging systems but still they need some more attention to refine their system. Some of the languages need to develop from the scratch. Balochi language has no any POS tagging system. This study presents the comparative analysis of POS tagging approaches for the national language (Urdu) and other regional languages of Pakistan. The approaches and their data sets used and their reported results are presented here

show abstract

“…Sindhi is a less resourced language [3,4] in comparison of English language. Nevertheless, some work has been done on tokenization and POS tagging of Sindhi text [5][6][7] as well as NLP tools are accessible online for solution of Sindhi linguistic problems [7]. In this connection, Sindhi Devanagari script [8]…”

Section: Introductionmentioning

confidence: 99%

An Analysis of Sindhi Annotated Corpus using Supervised Machine Learning Methods

Ali

Wagan

2019

Mehran Univ. res. j. eng. technol.

View full text Add to dashboard Cite

The linguistic corpus of Sindhi language is significant for computational linguistics process, machine learning process, language features identification and analysis, semantic and sentiment analysis, information retrieval and so on. There is little computational linguistics work done on Sindhi text whereas, English, Arabic, Urdu and some other languages are fully resourced computationally. The grammar and morphemes of these languages are analyzed properly using dissimilar machine learning methods. The development and research work regarding computational linguistics are in progress on Sindhi language at this time. This study is planned to develop the Sindhi annotated corpus using universal POS (Part of Speech) tag set and Sindhi POS tag set for the purpose of language features and variation analysis. The features are extracted using TF-IDF (Term Frequency and Inverse Document Frequency) technique. The supervised machine learning model is developed to assess the annotated corpus to know the grammatical annotation of Sindhi language. The model is trained with 80% of annotated corpus and tested with 20% of test set. The cross-validation technique with 10-folds is utilized to evaluate and validate the model. The results of model show the better performance of model as well as confirm the proper annotation to Sindhi corpus. This study described a number of research gaps to work more on topic modeling, language variation, sentiment and semantic analysis of Sindhi language.

show abstract

Sindhi Part of Speech Tagging System Using Wordnet

Cited by 12 publications

References 8 publications

Challenges of Accent and vowels for Sindhi Speech Recognition System

Challenges of Accent and vowels for Sindhi Speech Recognition System

Analysis and Comparative Study of POS Tagging Techniques for National (Urdu) Language and other Regional Languages of Pakistan

An Analysis of Sindhi Annotated Corpus using Supervised Machine Learning Methods

Contact Info

Product

Resources

About