Ontological lexicons are considered a rich source of knowledge for the development of various natural language processing tools and applications; however, they are expensive to build, maintain, and extend. In this paper, we present the Badea system for the semi-automated extraction of lexical relations, specifically antonyms using a pattern-based approach to support the task of ontological lexicon enrichment. The approach is based on an ontology of "seed" pairs of antonyms in the Arabic language; we identify patterns in which the pairs occur and then use the patterns identified to find new antonym pairs in an Arabic textual corpora. Experiments are conducted on Badea using texts from three Arabic textual corpuses: KSUCCA, KACSTAC, and CAC. The system is evaluated and the patterns' reliability and system performance is measured. The results from our experiments on the three Arabic corpora show that the pattern-based approach can be useful in the ontological enrichment task, as the evaluation of the system resulted in the ontology being updated with over 300 new antonym pairs, thereby enriching the lexicon and increasing its size by over 400%. Moreover, the results show important findings on the reliability of patterns in extracting antonyms for Arabic. The Badea system will facilitate the enrichment of ontological lexicons that can be very useful in any Arabic natural language processing system that requires semantic relation extraction.Keywords: Antonym Extraction, Ontology, Arabic Lexicon, Semantic Relation, Arabic NLP
INTRODUCTIONA lexicon is defined as "the knowledge that a native speaker has about a language. This includes information about the form and meanings of words and phrases, lexical categorization, the appropriate usage of words and phrases, relationships between words and phrases, and categories of words and phrases" [1]. Lexicon is an essential element for natural language processing (NLP) applications. For some applications, such as machine translation, lexicon is a critical resource [2]. An important aspect of such lexicons that renders them effective, reusable, and sharable within the community is building them upon standards such as semantic Web standards in the form of ontologies. An ontological lexicon is a lexicon designed using an ontological model and developed as an ontology. Ontological lexicons play a vital role in NLP applications such as language analysis, semantic annotation, summarization, machine translation, sense disambiguation, generation of lexical-competence questions used in standard language tests, and other applications that rely on implicit information in the text.Although ontological lexicons provide a rich source of knowledge for NLP applications, like other types of computational lexicons, they are expensive to build, maintain, and extend [2] [3] [4]. Moreover, the task of relation extraction is essential in any ontological lexicon development. Relation extraction focuses on the extraction of structured relations from unstructured sources such as Web documents ...