This paper presents a survey of Arabic treebanks to facilitate their reuse for the building of new linguistic resources. In our case, we created from a treebank an automatically induced Property Grammar (GP). So, we discussed characteristics of these treebanks to choose the appropriate one. To build our resource, we adopted an automatic technique, acquiring first a contextfree grammar (CFG) from the chosen treebank, and second, inducing a GP by generating relations between grammatical units described in the CFG.
Abstract. We present a method based on the formalism of Property Grammars to enrich the Arabic treebank ATB with syntactic constraints (so-called properties). The Property Grammar formalism is an effectively constraint-based approach that directly specifies the constraints on information categories. This can facilitate the enrichment process. The latter is based on three phases: the problem formalization, the Property Grammar induction from the ATB and the treebank regeneration with a new syntactic property-based representation. The enrichment of the ATB can make it more useful for many NLP applications such as the ambiguity resolution. This allows also the acquisition of new linguistic resources and the ease of the probabilistic parsing process. This enrichment process is purely automatic and independent from any language and source corpus formalism. This motivates its reuse. We obtained good and encouraging experiment results and various properties of different types.
The enrichment of an Arabic treebank with syntactic properties can facilitate many types of parsing processes. This enrichment allows also the increase of its use in different NLP applications, the acquirement of new linguistic resources and the ease of the probabilistic parsing process by using statistics to limit the properties to the satisfied ones or to the most frequent ones. In this context, our proposed enrichment method is based on a formalization phase, a Property Grammar induction phase from a source treebank and a treebank regeneration phase with a new syntactic property-based representation. Starting with a formalization phase in our enrichment problem may succeed its resolution procedure. In fact, it limits the specification of the data sets and the interactions between them to the used ones, which avoids any duplication. The formalization allows also the anticipation of the constraints to respect in the problem. The implementation of this enrichment method is experimented essentially on the Arabic treebank ATB. This experiment provides us with good and encouraging results and various properties of different types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.