Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Kocián, Matěj; Náplava, Jakub; Štancl, Daniel; Kadlec, Vladimír

doi:10.48550/arxiv.2112.01810

Cited by 2 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For our experiments on English, we use the pre-trained ELECTRA-small model introduced by Clark et al (2020), which has 14M parameters. For Czech, we employ the pre-trained monolingual model Small-E-Czech (Kocián et al, 2021) with the same size and architecture. Firstly, we train separate models for both tasks (ABSA and SRL) and select the optimal set of hyper-parameters on the development data.…”

Section: Datasets and Models Fine-tuningmentioning

confidence: 99%

Improving Aspect-Based Sentiment with End-to-End Semantic Role Labeling Model

Přibáň,

Pražák

2023

Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Proce

View full text Add to dashboard Cite

This paper presents a series of approaches aimed at enhancing the performance of Aspect-Based Sentiment Analysis (ABSA) by utilizing extracted semantic information from a Semantic Role Labeling (SRL) model. We propose a novel end-to-end Semantic Role Labeling model that effectively captures most of the structured semantic information within the Transformer hidden state. We believe that this end-to-end model is well-suited for our newly proposed models that incorporate semantic information. We evaluate the proposed models in two languages, English and Czech, employing ELECTRA-small models. Our combined models improve ABSA performance in both languages. Moreover, we achieved new stateof-the-art results on the Czech ABSA. * Equal contribution. 1 See (Pontiki et al., 2014) for a detailed description of all the subtasks.

show abstract

Section: Datasets and Models Fine-tuningmentioning

confidence: 99%

Improving Aspect-Based Sentiment with End-to-End Semantic Role Labeling Model

Přibáň,

Pražák

2023

Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Proce

View full text Add to dashboard Cite

show abstract

“…Czech Electra model (Kocián et al, 2021), two multilingual models mBERT (Devlin et al, 2019), XLM-R (Conneau et al, 2020) and the original monolingual English BERT model (Devlin et al, 2019), see We fine-tune all the models for the binary classification task, i.e., subjective vs. objective sentence detection. For all models based on the original BERT model, we use the hidden vector h ∈ R H of the classification token [CLS] that represents the entire input sequence, where H is the hidden size of the model.…”

Section: Transformer Modelsmentioning

confidence: 99%

“…Czech Electra(Kocián et al, 2021) is Czech model based on the Electra-small model(Clark et al, 2020).Czert-B(Sido et al, 2021) is Czech variant of the original BERT BASE model(Devlin et al, 2019).RobeCzech(Straka et al, 2021) is Czech version of the RoBERTa model(Liu et al, 2019).BERT(Devlin et al, 2019) is the original BERT BASE model.mBERT(Devlin et al, 2019) is a cased multilingual version of the BERT BASE that was jointly trained on 104 languages.XLM-R-Large(Conneau et al, 2020) is a multilingual version of the RoBERTa(Liu et al, 2019) that supports 100 languages.…”

mentioning

confidence: 99%

Czech Dataset for Cross-lingual Subjectivity Classification

Přibáň¹,

Shinar²

2022

Preprint

View full text Add to dashboard Cite

In this paper, we introduce a new Czech subjectivity dataset of 10k manually annotated subjective and objective sentences from movie reviews and descriptions. Our prime motivation is to provide a reliable dataset that can be used with the existing English dataset as a benchmark to test the ability of pre-trained multilingual models to transfer knowledge between Czech and English and vice versa. Two annotators annotated the dataset reaching 0.83 of the Cohen's κ inter-annotator agreement. To the best of our knowledge, this is the first subjectivity dataset for the Czech language. We also created an additional dataset that consists of 200k automatically labeled sentences. Both datasets are freely available for research purposes. Furthermore, we fine-tune five pre-trained BERT-like models to set a monolingual baseline for the new dataset and we achieve 93.56% of accuracy. We fine-tune models on the existing English dataset for which we obtained results that are on par with the current state-of-the-art results. Finally, we perform zero-shot cross-lingual subjectivity classification between Czech and English to verify the usability of our dataset as the cross-lingual benchmark. We compare and discuss the cross-lingual and monolingual results and the ability of multilingual models to transfer knowledge between languages.

show abstract

Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Cited by 2 publications

References 14 publications

Improving Aspect-Based Sentiment with End-to-End Semantic Role Labeling Model

Improving Aspect-Based Sentiment with End-to-End Semantic Role Labeling Model

Czech Dataset for Cross-lingual Subjectivity Classification

Contact Info

Product

Resources

About