Mutee U Rahman scite author profile

Mutee U Rahman

5Publications

19Citation Statements Received

50Citation Statements Given

How they've been cited

How they cite others

Affiliations

Isra University

Publications

Order By: Most citations

Developing a POS Tagged Corpus of Urdu Tweets

et al. 2020

View full text Add to dashboard Cite

Processing of social media text like tweets is challenging for traditional Natural Language Processing (NLP) tools developed for well-edited text due to the noisy nature of such text. However, demand for tools and resources to correctly process such noisy text has increased in recent years due to the usefulness of such text in various applications. Literature reports various efforts made to develop tools and resources to process such noisy text for various languages, notably, part-of-speech (POS) tagging, an NLP task having a direct effect on the performance of other successive text processing activities. Still, no such attempt has been made to develop a POS tagger for Urdu social media content. Thus, the focus of this paper is on POS tagging of Urdu tweets. We introduce a new tagset for POS-tagging of Urdu tweets along with the POS-tagged Urdu tweets corpus. We also investigated bootstrapping as a potential solution for overcoming the shortage of manually annotated data and present a supervised POS tagger with an accuracy of 93.8% precision, 92.9% recall and 93.3% F-measure.

show abstract

Towards Transliteration between Sindhi Scripts Using Roman Script

Leghari¹,

Rahman²

2015

LLR

View full text Add to dashboard Cite

Towards Sindhi Corpus Construction

Rahman

2015

LLR

View full text Add to dashboard Cite

The paper discusses the current state of Sindhi corpus construction in detail. Sindhi corpus development issues including corpus acquisition, preprocessing, and tokenization are discussed in detail. Preliminary results and observations which include letter unigram, bigram and trigram frequencies; word frequencies and word bigram frequencies are presented. Current state of Sindhi corpus with its limitations and future work is also discussed. The paper also explores the orthography and script of Sindhi language with reference to corpus development.

show abstract

Performance Comparison of Bootstrapped Statistical Taggers on Urdu Tweets

Baig¹,

Rahman²,

Abrejo³

et al. 2021

IJSRP

View full text Add to dashboard Cite

Twitter, a social media platform has experienced substantial growth over the last few years. Thus, huge number of tweets from various communities is available and used for various NLP applications such as Opinion mining, information extraction, sentiment analysis etc. One of the key pre-processing steps in such NLP applications is Part-of-Speech (POS) tagging. POS tagging of Twitter data (also called noisy text) is different than conventional POS tagging due to informal nature and presence of Twitter specific elements. Resources for POS tagging of tweet specific data are mostly available for English. Though, availability of tagset and language independent statistical taggers do provide opportunity for resource-poor languages such as Urdu to expand coverage of NLP tools to this new domain of POS tagging for which little effort has been reported. The aim of this study is twofold. First, is to investigate how well the statistical taggers developed for POS tagging of structured text fare in the domain of tweet POS tagging. Secondly, how can these taggers be used to overcome the bottleneck of manually annotated corpus for this new domain. To this end, Stanford and MorphoDiTa taggers were trained on 500 Urdu tweet gold-standard corpus and were utilized for semi-automatic corpus annotation in bootstrapped fashion. Five bootstrapping iterations for both the taggers were performed. At the end of each iteration, the performance of taggers was evaluated against the development set and automatically tagged, manually corrected 100 tweets were added in the training set to retrain both models. Finally, at the end of last iteration, tagger performance was evaluated against test set. Stanford tagger achieved an accuracy of 93.8% Precision, 92.9% Recall and 93.3% F-Measure. Whereas, MorphoDiTa tagger achieved an accuracy of 93.5% Precision, 92.6% Recall and 93% F-Measure. A thorough error analysis on the output of both taggers is also presented.

show abstract

Adverb agreement in Urdu, Sindhi and Punjabi

Butt

Sulger

Rahman³

et al. 2016

hpsg

View full text Add to dashboard Cite

We discuss agreeing adverbs in Urdu, Sindhi and Punjabi. We adduce crosslinguistic evidence that is based mainly on similar patterns in Romance and posit that there is a close connection between resultatives and so-called pseudo-resultatives, which the agreeing adverbs appear to instantiate. We propose a diachronic relationship by which the originally predicative part of a resultative is reinterpreted as an adjunct that modifies the overall event predication, not just the result.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mutee U Rahman

Developing a POS Tagged Corpus of Urdu Tweets

Towards Transliteration between Sindhi Scripts Using Roman Script

Towards Sindhi Corpus Construction

Performance Comparison of Bootstrapped Statistical Taggers on Urdu Tweets

Adverb agreement in Urdu, Sindhi and Punjabi

Contact Info

Product

Resources

About