In this paper, we address the problem of the large coverage dictionaries of Arabic language usable both for direct human reading and automatic Natural Language Processing. For these purposes, we propose a normalized and implemented modeling, based on Lexical Markup Framework (LMF-ISO 24613) and Data Registry Category (DCR-ISO 12620), which allows a stable and well-defined interoperability of lexical resources through a unification of the linguistic concepts. Starting from the features of the Arabic language, and due to the fact that a large range of details and refinements need to be described specifically for Arabic, we follow a finely structuring strategy. Besides its richness in morphology, syntax and semantics knowledge, our model includes all the Arabic morphological patterns to generate the inflected forms from a given lemma and highlights the syntactic-semantic relations. In addition, an appropriate codification has been designed for the management of all types of relationships among lexical entries and their related knowledge. According to this model, a dictionary named El Madar 1 has been built and is now publicly available on line. The data are managed by a user-friendly Web-based lexicographical workstation. This work has not been done in isolation, but is the result of a collaborative effort by an international team mainly within the ISO network during a period of eight years. A. Khemakhem et al.is to merge them in order to obtain a new richer resource. More generally, the exchange remains a difficult (and expensive) issue when nothing has been scheduled for this purpose. To meet this challenge, several projects were conducted such as ACQUILEX (Bogurev et al. These projects led to the emergence of the LMF (Lexical Markup Framework) ISO standard for the lexical structure modeling (ISO 24613) (Francopoulo 2003), (Francopoulo and George 2008) in association with the ISO Data Categories Registry (DCR) 3 following ISO 12620 (Ide and Romary 2004). These standards were designed by a group of sixty ISO experts coming from different cultures, languages and continents. Numerous developments followed in different parts of the world. 4 Unfortunately, the Arabic language did not immediately benefit from the emergence of these standards, although it is spoken by more than 300 million people around the world, and is the official language of more than twenty countries. The language still uses references to different printed dictionaries based on incompatible lexicographical schools. Only few works tried the application of LMF on the Arabic language out, according to previous revisions of this standard. Some developments were made in morphology (Khemakhem, Gargouri and Abdelwahed 2006), (Romary, Salmon-Alt and Francopoulo 2004), (Salmon-Alt, Akrout and Romary 2005) and some studies were conducted in syntax (Loukil, Haddar and Ben Hamadou 2008). However, these works were developed during the drafting of the LMF standard and were not updated according to the ISO validation.Obviously, the situation of the Arabic lexica...
No abstract
This paper is interested in the development of the Arabic electronic dictionaries of human use (editorial use). It proposes a unified and standardized model for these dictionaries according to the future standard LMF (Lexical Markup Framework) ISO 24613. Thanks to its subtle and standardized structure, this model allows the development of extendable dictionaries on which generic interrogation functions adapted to the user's needs can be implemented. This model has already been carried out on some existing Arabic dictionaries using the ADIQTQ (Arabic DIctionary Query Tool) system, which we developed for the generic interrogation of standardized dictionaries of Arabic.
Abstract.The collaborative enrichment is a new tendency in constructing resources, notably electronic dictionaries. This approach is considered very efficient for resources weakly structured. In this paper, we deal with applying the collaborative enrichment for electronic dictionaries standardized according to LMF-ISO 24613. The models of such dictionaries are complex and finely structured. The purpose of the paper is, on the one hand, to expose the challenges related to this framework and, in the second hand, to propose practical solutions based on an appropriate approach. This approach ensures the properties of completeness, consistency and nonredundancy of lexical data. In order to illustrate the proposed approach, we describe the experimentation carried out on a standardized Arabic dictionary. Keywords: Collaborative enrichment, LMF normalized dictionaries, coherence, non-redundancy, completeness. IntrodutionElectronic dictionaries contribute enormously to the learning, the dissemination, the maintenance and the evolution of natural languages. However, the construction of such dictionaries is a difficult task given the richness of natural languages. It is very expensive in time and number of people typing dealing with enormous content of lexical resources. Moreover, it is not limited in time because of the continuous need of enrichment. In order to tackle the problems related to the enrichment of electronic dictionaries, the tendency was the resort to a collaborative approach. Therefore, several works was proposed such as [3] [4] [12] and [14]. The well known application of the collaborative approach for filling and updating large resources is the Wiktionary [13] that currently covers several languages. However, the mentioned works deal with a superficially structure (or model) of resources. Indeed, their syntactic models are very light and don't link synonyms through the concerned senses. Moreover, relation between senses and syntactic knowledge are not covered. Thus, the update of such resources is available for all kinds of users who are not necessarily experts in the lexicography or in the linguistic domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.