Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanit 2017
DOI: 10.18653/v1/w17-2215
|View full text |Cite
|
Sign up to set email alerts
|

Lexical Correction of Polish Twitter Political Data

Abstract: Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle noisy, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled 'political' flavour) differs from expectation of linguistic tools and how it could be co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…Another simple approach is the aforementioned diacritical swapping, which is a term that we introduce here for referring to a solution inspired by the work of (Ogrodniczuk and Kopeć, 2017). Namely, from the incorrect form we try to produce all strings obtainable by either adding or removing diacritical marks from characters.…”
Section: Baseline Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Another simple approach is the aforementioned diacritical swapping, which is a term that we introduce here for referring to a solution inspired by the work of (Ogrodniczuk and Kopeć, 2017). Namely, from the incorrect form we try to produce all strings obtainable by either adding or removing diacritical marks from characters.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…Error cases provided by PlEWi are, therefore, not a balanced representation of spelling errors in written Polish language. PlEWi does have the advantage of scale in comparison to existing literature, such as (Ogrodniczuk and Kopeć, 2017) operating on a set of only 740 annotated errors in tweets.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations