2023
DOI: 10.1109/access.2023.3267746
|View full text |Cite
|
Sign up to set email alerts
|

B-NER: A Novel Bangla Named Entity Recognition Dataset With Largest Entities and Its Baseline Evaluation

Abstract: Within the Natural Language Processing (NLP) framework, Named Entity Recognition (NER) is regarded as the basis for extracting key information to understand texts in any language. As Bangla is a highly inflectional, morphologically rich, and resource-scarce language, building a balanced NER corpus with large and diverse entities is a demanding task. However, previously developed Bangla NER systems are limited to recognizing only three familiar entities: person, location, and organization. To address this signi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 44 publications
0
2
0
Order By: Relevance
“…After closely examining the samples in the dataset, we found that some of them contained digits and numbers with no apparent semantic meaning. Phone numbers, currencies, and percentages are examples of numerical entities that are commonly identified and categorized using Named Entity Recognition (NER) tools [48]. However, there is currently no NER tool available for Chittagonian text.…”
Section: Removing English Digitsmentioning
confidence: 99%
“…After closely examining the samples in the dataset, we found that some of them contained digits and numbers with no apparent semantic meaning. Phone numbers, currencies, and percentages are examples of numerical entities that are commonly identified and categorized using Named Entity Recognition (NER) tools [48]. However, there is currently no NER tool available for Chittagonian text.…”
Section: Removing English Digitsmentioning
confidence: 99%
“…It will also help relieve the strain on healthcare resources and the increased demand for medical consultations. Haque et al [96] proposed the unique dataset B-NER, the biggest fine-grained Bangla NER dataset by employing the BIO tagging approach which have been produced by using 22,144 sentences that have been directly annotated and gathered from Bangla newspapers and Bangla Wikipedia. There are 9,895 separate phrases in this dataset that have been manually classified into eight different categories, including organizations, events, people, time, artifacts, markers, geopolitical entities, geographic locations and natural phenomena.…”
Section: (C) Deep Learning Approachmentioning
confidence: 99%
“…Removing English digits: Upon thorough examination of the dataset samples, we observed the presence of digits and numbers that did not carry specific semantic meaning. In standard practice, named entity recognition (NER) [61] tools are employed to identify and categorize such numerical entities, such as phone numbers, percentages, and currencies. However, for the Chittagonian dialect, no NER tool is currently available.…”
Section: Data Preprocessingmentioning
confidence: 99%