Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019) 2019
DOI: 10.18653/v1/w19-7811
|View full text |Cite
|
Sign up to set email alerts
|

tweeDe – A Universal Dependencies treebank for German tweets

Abstract: We introduce the first German treebank for Twitter microtext, annotated within the framework of Universal Dependencies. The new treebank includes over 12,000 tokens from over 500 tweets, independently annotated by two human coders. In the paper, we describe the data selection and annotation process and present baseline parsing results for the new testsuite.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…However, in terms of other language's noisy treebanks, the largest one is German tweets treebank tweeDe, with more than 12,000 tweets. tweeDe has an accuracy of 80.65% UAS and 72.69% LAS [20]. PoSTWITA-UD, an Italian tweet treebank is second largest Tweet treebank with 6,700 tweets and has an accuracy of 86.95% UAS and 81.5% LAS [22].…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…However, in terms of other language's noisy treebanks, the largest one is German tweets treebank tweeDe, with more than 12,000 tweets. tweeDe has an accuracy of 80.65% UAS and 72.69% LAS [20]. PoSTWITA-UD, an Italian tweet treebank is second largest Tweet treebank with 6,700 tweets and has an accuracy of 86.95% UAS and 81.5% LAS [22].…”
Section: Resultsmentioning
confidence: 99%
“…Nonetheless, a pipeline for tokenizing, tagging, and parsing the tweets was trained by them, and ensemble and distillation models were developed for parsing accuracy improvement. [20] developed and annotated TweeDe, the first German Twitter treebank, as a new training and test suite for UD parsing. TweeDe includes more than 12,000 tokens of informal private communication, annotated for PoS, morphology and UD syntactic dependencies.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In the future, in order to validate our findings, we will propose a novel and wider experimental setting, where more languages are included, e.g. testing our approach on other languages for which both ironyannotated datasets and UD resources are available, such as Arabic (Ghanem et al, 2020; or German 21 (Rehbein et al, 2019).…”
Section: Discussionmentioning
confidence: 97%
“…We use treebanks annotated with Universal Dependencies V2.7 (Nivre et al, 2020). For German, we use GSD, which is based on news, reviews, and Wikipedia pages, and tweeDe (Rehbein et al, 2019) as the Twitter treebank. For Italian, we use ISDT and Par-TUT, which consist of legal, news, and Wikipedia texts, plus TWITTIRÒ (Cignarella et al, 2019) and PoSTWITA (Sanguinetti et al, 2018) as the Twitter treebanks.…”
Section: Treebanksmentioning
confidence: 99%