TrelBERT: A pre-trained encoder for Polish Twitter

Szmyd, Wojciech; Kotyla, Alicja; Zobniów, Michał; Falkiewicz, Piotr; Bartczuk, Jakub; Zygadło, Artur

doi:10.18653/v1/2023.bsnlp-1.3

Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023) 2023

DOI: 10.18653/v1/2023.bsnlp-1.3

|View full text |Cite

TrelBERT: A pre-trained encoder for Polish Twitter

Wojciech Szmyd,

Alicja Kotyla,

Michał Zobniów

et al.

Abstract: Pre-trained Transformer-based models have become immensely popular amongst NLP practitioners. We present TrelBERT -the first Polish language model suited for application in the social media domain. TrelBERT is based on an existing general-domain model and adapted to the language of social media by pre-training it further on a large collection of Twitter data. We demonstrate its usefulness by evaluating it in the downstream task of cyberbullying detection, in which it achieves state-of-the-art results, outperfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

Okulska,

Kołos

2024

Tertium

View full text Add to dashboard Cite

The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.

show abstract

A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

Okulska,

Kołos

2024

Tertium

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

TrelBERT: A pre-trained encoder for Polish Twitter

Cited by 1 publication

References 15 publications

A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

Contact Info

Product

Resources

About