2022
DOI: 10.1093/comjnl/bxac047
|View full text |Cite
|
Sign up to set email alerts
|

Urdu Named Entity Recognition System Using Deep Learning Approaches

Abstract: Named entity recognition (NER) is a fundamental part of other natural language processing tasks such as information retrieval, question answering systems and machine translation. Progress and success have already been achieved in research on the English NER systems. However, the Urdu NER system is still in its infancy due to the complexity and morphological richness of the Urdu language. Existing Urdu NER systems are highly dependent on manual feature engineering and word embedding to capture similarity. Their… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…Determining the optimal n-gram window size depends on multiple factors including the nature of the text in a corpus, different sentence structures that may produce different patterns and dependencies between words, and target language and writing styles. For the Urdu language, various studies generally use an n-gram window size ranging from 3 to 5 for different NLP tasks while existing studies on Urdu NER [29][30][31]53] have used a window size of five. For this study, we tested different n-gram window size values ranging from three to five and selected the best-performing n-gram window size of five.…”
Section: Hyperparameter Tuningmentioning
confidence: 99%
See 2 more Smart Citations
“…Determining the optimal n-gram window size depends on multiple factors including the nature of the text in a corpus, different sentence structures that may produce different patterns and dependencies between words, and target language and writing styles. For the Urdu language, various studies generally use an n-gram window size ranging from 3 to 5 for different NLP tasks while existing studies on Urdu NER [29][30][31]53] have used a window size of five. For this study, we tested different n-gram window size values ranging from three to five and selected the best-performing n-gram window size of five.…”
Section: Hyperparameter Tuningmentioning
confidence: 99%
“…Deep learning (DL) based studies use different pre-trained word embedding techniques [21][22][23][24][25][26][27] to map the words in vectors using the language vocabulary to automatically extract meaningful relationships among words in the dataset. Due to limited vocabulary size, out-of-vocabulary words pose significant challenges for morphology-rich languages [28] like Urdu due to language complexities [29]. Existing studies have used DL models for Urdu NER [29][30][31] by utilizing Word2Vec, Txt2Vec, MKW2v, and GloVe embeddings.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Supervised systems infer rules through pre-labeled input, referred to as training data, and conform to estimation methods, nonparametric, or kernel-based learning algorithms, as well as logic-based algorithms [23,25]. NER frameworks based on machine learning are often more customizable than rule-based techniques [26]. An ML technique could adjust to new contexts with little cost provided that training data is freely available [18].…”
Section: Related Workmentioning
confidence: 99%