Named entity recognition (NER) is a fundamental process in NLP and a requirement for most processes. This article aims to identify the named entities in the context of social networks. For this purpose, the idea of segmenting text into suitable and unsuitable expressions for the named entities has been used. So the contribution of this article is to process informal text in the Persian language by the Beam search algorithm to detect named entities. Due to the reproductive nature of language, new words and names are always produced, and available NER systems are inefficient in detecting new entities. The other contribution of this article is to make it possible to recognize the emerging named entity by applying dynamic external knowledge. According to a sense of the lack of datasets in low‐resource languages, N‐Gram and Wikipedia anchor datasets have been prepared for Persian and deployed as external knowledge. Also, a corpus of named entities in Persian from the telegram dataset has been generated. Three native experts have done labeling of this corpus. Evaluation of these three experts and the proposed method shows that the result of the proposed method is acceptable compared to the result of a human‐to‐human also to other methods.
Named entity recognition (NER) is a subfield of natural language processing (NLP). It is able to identify proper nouns, such as person names, locations, and organizations, and has been widely used in various tasks. NER can be practical in extracting information from social media data. However, the unstructured and noisy nature of social media (such as grammatical errors and typos) causes new challenges for NER, especially for low-resource languages such as Persian, and existing NER methods mainly focus on formal texts and English social media. To overcome this challenge, we consider Persian NER as an optimization problem and use the binary Gray Wolf Optimization (GWO) algorithm to segment posts into small possible phrases of named entities. Later, named entities are recognized based on their score. Also, we prove that even human opinion can differ in the NER task and compare our method with other systems with the S e p _ T D _ T e l 01 dataset and the results show that our proposed system obtains a higher F1 score in comparison with other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.