Urdu Named Entity Recognition

Kanwal, Safia; Malik, Kamran; Shahzad, Khurram; Aslam, Faisal; Nawaz, Zahid

doi:10.1145/3329710

Cited by 28 publications

(22 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For NER, we use NL, EN, and DE datasets from CoNLL-2002 and2003 challenges (Tjong Kim Sang, 2002;Tjong Kim Sang and De Meulder, 2003). Additionally, we use the People's Daily dataset 4 , iob2corpus 5 , AQMAR (Mohit et al, 2012), ArmanPerosNERCorpus (Poostchi et al, 2016), MK-PUCIT (Kanwal et al, 2020), and a news-based NER dataset (Mordecai and Elhadad, 2012) for the languages CN, JA, AR, FA, UR, and HE, respectively. Since the NER datasets are individually constructed in each language, their label sets do not fully agree.…”

Section: Datasetsmentioning

confidence: 99%

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Ma¹,

Zhang²,

Lou³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multilingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings. * Equal contribution. † Work done when interning at the Minds, Machines, and Society Lab at Dartmouth College. 1 We regard single-source machine translation as a monolingual task since the inputs to the models are mono-lingual.

show abstract

Section: Datasetsmentioning

confidence: 99%

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Ma¹,

Zhang²,

Lou³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…The Named Entity Recognition system recognizes named entities (NEs) and classifies them into predefined categories, such as a person, location, organization, and time [1]. It is used as the first step in question answering [2], information retrieval [1], text summarization [3], machine translation [4], and more [5]. A series of neural NER models have been proposed over the past decade for English [6][7][8][9], Chinese [10][11][12], Japanese [13], Urdu [4,14], and multilingual systems [6,15], which have achieved state-of-the-art performance.…”

Section: Introductionmentioning

confidence: 99%

“…It is used as the first step in question answering [2], information retrieval [1], text summarization [3], machine translation [4], and more [5]. A series of neural NER models have been proposed over the past decade for English [6][7][8][9], Chinese [10][11][12], Japanese [13], Urdu [4,14], and multilingual systems [6,15], which have achieved state-of-the-art performance. The NER task in Asian languages [16] has recently attracted many researchers due to its importance and widespread NLP applications.…”

Section: Introductionmentioning

confidence: 99%

“…Unlike other languages, Sindhi has spelling variations, ambiguities in suffixes, and different writing styles (inclusion and exclusion of space), which further contribute to increasing the difficulty in language processing and the NER [17] task. The NER in the English language has significantly benefited from its capitalization rule, part-of-speech tagging, and availability of other language resources [4]. On the contrary, Sindhi has no capitalization rule, which makes difference between plain text and NEs [17].…”

Section: Introductionmentioning

confidence: 99%

“…We summarize the challenges related to Sindhi NER and characteristics in Table 1. Some of these ambiguities may also be found in other Asian languages, such as Urdu [4]. Table 1.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Context-Aware Bidirectional Neural Model for Sindhi Named Entity Recognition

Ali

Kumar

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

Named entity recognition (NER) is a fundamental task in many natural language processing (NLP) applications, such as text summarization and semantic information retrieval. Recently, deep neural networks (NNs) with the attention mechanism yield excellent performance in NER by taking advantage of character-level and word-level representation learning. In this paper, we propose a deep context-aware bidirectional long short-term memory (CaBiLSTM) model for the Sindhi NER task. The model relies upon contextual representation learning (CRL), bidirectional encoder, self-attention, and sequential conditional random field (CRF). The CaBiLSTM model incorporates task-oriented CRL based on joint character-level and word-level representations. It takes character-level input to learn the character representations. Afterwards, the character representations are transformed into word features, and the bidirectional encoder learns the word representations. The output of the final encoder is fed into the self-attention through a hidden layer before decoding. Finally, we employ the CRF for the prediction of label sequences. The baselines and the proposed CaBiLSTM model are compared by exploiting pretrained Sindhi GloVe (SdGloVe), Sindhi fastText (SdfastText), task-oriented, and CRL-based word representations on the recently proposed SiNER dataset. Our proposed CaBiLSTM model achieved a high F1-score of 91.25% on the SiNER dataset with CRL without relying on additional handmade features, such as hand-crafted rules, gazetteers, or dictionaries.

show abstract