Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages - Semitic '04 2004
DOI: 10.3115/1621804.1621808
|View full text |Cite
|
Sign up to set email alerts
|

Developing an Arabic treebank

Abstract: In this paper we address the following questions from our experience of the last two and a half years in developing a large-scale corpus of Arabic text annotated for morphological information, part-of-speech, English gloss, and syntactic structure: (a) How did we 'leapfrog' through the stumbling blocks of both methodology and training in setting up the Penn Arabic Treebank (ATB) annotation? (b) How did we reconcile the Penn Treebank annotation principles and practices with the Modern Standard Arabic (MSA) trad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 64 publications
(38 citation statements)
references
References 4 publications
0
38
0
Order By: Relevance
“…The Penn Arabic Treebank Maamouri et al, 2009) is a Linguistic Data Consortium (LDC) project, for which there are currently 12 parts for MSA. PATB consists of constituency trees, the sources of which are newswire articles from a variety of news sources.…”
Section: Patbmentioning
confidence: 99%
“…The Penn Arabic Treebank Maamouri et al, 2009) is a Linguistic Data Consortium (LDC) project, for which there are currently 12 parts for MSA. PATB consists of constituency trees, the sources of which are newswire articles from a variety of news sources.…”
Section: Patbmentioning
confidence: 99%
“…The ATB [11], is the richest Arabic treebank in reliable annotations (POS tags, syntactic and semantic hashtags), which are also compatible to consensus developed and validated by linguists. Its source documents are relevant, varied and large.…”
Section: Related Workmentioning
confidence: 99%
“…Table 1. The examples of the Arabic sentences parsed according to the annotation of the ATB In Table 1 above, the meanings of the symbols S, T, G and P are respectively as follows: the Arabic Sentences, the Buckwalter Transliterations 1 of the sentences, their Gloss in English language and their Parsing representations according to the ATB annotations [11]. In the following, we present some structure examples extracted from these sentences to explain the application of different syntactic properties of the GP formalism.…”
Section: Arabic Linguistic Study Within the Gp Formalismmentioning
confidence: 99%
See 1 more Smart Citation
“…The Penn Arabic Treebank (Maamouri et al, 2004;Maamouri et al, 2009) is a Linguistic Data Consortium (LDC) project, for which there are currently 12 parts for MSA. PATB consists of constituency trees, the sources of which are newswire articles from a variety of news sources.…”
Section: Patbmentioning
confidence: 99%