Handbook of Linguistic Annotation 2017
DOI: 10.1007/978-94-024-0881-2_21
|View full text |Cite
|
Sign up to set email alerts
|

Prague Dependency Treebank

Abstract: We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is -as it always been the case for the family of the Prague Dependency Treebanks -to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research. PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme (albeit not everything is annotated manually, as we describe in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 51 publications
(41 citation statements)
references
References 17 publications
0
39
0
2
Order By: Relevance
“…Our treebanks come from version 2.0 of the HamleDT collection of treebanks [32,33]. This collection harmonizes previously existing treebanks for 30 different languages into two widely-used annotation guidelines: Universal Stanford dependencies [34] and Prague dependencies [35]. Therefore, this resource allows us to evaluate the baselines not only across a wide range of languages of different families, but also across two well-known annotation schemes.…”
Section: Methodsmentioning
confidence: 99%
“…Our treebanks come from version 2.0 of the HamleDT collection of treebanks [32,33]. This collection harmonizes previously existing treebanks for 30 different languages into two widely-used annotation guidelines: Universal Stanford dependencies [34] and Prague dependencies [35]. Therefore, this resource allows us to evaluate the baselines not only across a wide range of languages of different families, but also across two well-known annotation schemes.…”
Section: Methodsmentioning
confidence: 99%
“…While many Czech corpora has morphological annotation (done automatically), we have to take into account the syntax. Nowadays, several richly syntactically annotated corpora, collectively called Prague Dependency Treebanks (PDTs in the sequel; [9]), have been already developed. These corpora provide a large amount of valuable examples that are used as a basis for the determination of subcategorized meanings of adverbials.…”
Section: Data: Prague Dependency Treebanksmentioning
confidence: 99%
“…The main features of the annotation style are: -well-developed dependency syntax theory which is known as the functional Generative Description (fGD in the sequel; see [22], [24], [25]), -interlinked hierarchical layers of standoff annotation, -deep syntactic layer. In the years 1996 through 2005, the first Prague Dependency treebank 1 (PDT in the sequel; [11], for the latest version 3.0 see [2]) was designed and built. The data in PDT are composed by articles from the Czech daily newspapers.…”
Section: Data: Prague Dependency Treebanksmentioning
confidence: 99%
“…As two of the languages extensively use zero subjects, we could miss a lot of valuable information if we annotated coreference only on sur-face. Therefore, we adopted the style based on the theory of Functional Generative Description (Sgall et al, 1986), first used for Czech in Prague Dependency Treebank 2.0 (Hajič et al, 2006) and for Czech and English in Prague Czech-English Dependency Treebank 2.0 (Hajič et al, 2012). In this style, coreference and other anaphoric relations are annotated on the layer of deep syntax called tectogrammatical layer.…”
Section: Introductionmentioning
confidence: 99%