The Named Entity Recognition (NER) task has been garnering significant attention in NLP as it helps improve the performance of many natural language processing applications. In this paper, we investigate the impact of using different sets of features in two discriminative machine learning frameworks, namely, Support Vector Machines and Conditional Random Fields using Arabic data. We explore lexical, contextual and morphological features on eight standardized data-sets of different genres. We measure the impact of the different features in isolation, rank them according to their impact for each named entity class and incrementally combine them in order to infer the optimal machine learning approach and feature set. Our system yields a performance of F β=1-measure=83.5 on ACE 2003 Broadcast News data.
This Ph.D. thesis describes the investigations we carried out in order to determine the appropriate approach to build an efficient and robust Arabic Named Entity Recognition system. Such a system would have the ability to identify and classify the Named Entities within an open-domain Arabic text. The Named Entity Recognition (NER) task helps other Natural Language Processing approaches (e.g. Information Retrieval, Question Answering, Machine Translation, etc.) achieve a higher performance thanks to the significant information added to the text. In the literature, many research works report the adequate approaches which can be used to build an NER system for a specific language or from a languageindependent perspective. Yet, very few research works which investigate the task for the Arabic language have been published. The Arabic language has a special orthography and a complex morphology which bring new challenges to the NER task to be investigated. A complete investigation of Arabic NER would report the technique which helps achieve a high performance, as well as giving a detailed error analysis and results discussion so as to make the study beneficial to the research community. This thesis work aims at satisfying this specific need. In order to achieve that goal we have: 1. Studied the different aspects of the Arabic language which are related to the NER task; 2. Studied the state-of-art of the NER task; 3. Conducted a comparative study among the most successful Machine Learning approaches on the NER task; 4. Carried out a multi-classifier approach where each classifier deals with only one NE class and uses the appropriate Machine Learning approach and feature-set for the concerned class.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.