Proceedings of the 10th Web as Corpus Workshop 2016
DOI: 10.18653/v1/w16-2606
|View full text |Cite
|
Sign up to set email alerts
|

EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora

Abstract: This paper describes the goals, design and results of a shared task on the automatic linguistic annotation of German language data from genres of computer-mediated communication (CMC), social media interactions and Web corpora. The two subtasks of tokenization and part-of-speech tagging were performed on two data sets: (i) a genuine CMC data set with samples from several CMC genres, and (ii) a Web corpora data set of CC-licensed Web pages which represents the type of data found in large corpora crawled from th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 24 publications
0
13
0
1
Order By: Relevance
“…We evaluate our approach on the dataset of the EmpiriST 2015 shared task on automatic linguistic annotation of computer-mediated communication and social media (Beißwenger et al 2016) and compare it to the two systems that performed best on the share task as baselines.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…We evaluate our approach on the dataset of the EmpiriST 2015 shared task on automatic linguistic annotation of computer-mediated communication and social media (Beißwenger et al 2016) and compare it to the two systems that performed best on the share task as baselines.…”
Section: Discussionmentioning
confidence: 99%
“…Results on the EmpiriST 2015 shared task dataset (Beißwenger et al 2016) show that our approach improves accuracy on out-of-vocabulary words by up to 5.8%; overall, we improve state-of-the-art by 0.4% to 90.9% accuracy.…”
Section: Introductionmentioning
confidence: 86%
See 1 more Smart Citation
“…For example, Ljubešić et al (2017) show that performing normalization, in addition to using external resources, can remove half of the errors of a standard POS tagger for South Slavic languages. Quite surprisingly, instead, of all systems participating in shared tasks on POS tagging of Twitter data for both Italian (Bosco et al, 2016) and German (Beißwenger et al, 2016), none of the participating systems incorporated any normalization strategy before performing POS tagging.…”
Section: Related Workmentioning
confidence: 99%
“…The trial corpus contains around 3,600 tokens (2,100 CMC 8 , 1,500 Web) and was PoS tagged by one annotator (without systematic error checks). See Beißwenger et al (2016) for more details.…”
Section: German Wikipedia (W2v)mentioning
confidence: 99%