Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) 2021
DOI: 10.18653/v1/2021.semeval-1.2
|View full text |Cite
|
Sign up to set email alerts
|

OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction

Abstract: We propose an ensemble model for predicting the lexical complexity of words and multiword expressions (MWEs). The model receives as input a sentence with a target word or MWE and outputs its complexity score. Given that a key challenge with this task is the limited size of annotated data, our model relies on pretrained contextual representations from different state-of-the-art transformer-based language models (i.e., BERT and RoBERTa), and on a variety of training methods for further enhancing model generaliza… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…We evaluate the results of our system by applying the regression metrics in the execution of the supervised learning algorithms, specifically MAE, MSE, RMSE, and R2. We emphasize that we apply the methodologies of the winning teams, which are based on the application of language models based on the pre-trained and adjusted Transformers BERT and RoBERTa [9,34,35], together with the linguistic, syntactic and statistical characteristics, and the embedding results at the word and sentence level.…”
Section: Resultsmentioning
confidence: 99%
“…We evaluate the results of our system by applying the regression metrics in the execution of the supervised learning algorithms, specifically MAE, MSE, RMSE, and R2. We emphasize that we apply the methodologies of the winning teams, which are based on the application of language models based on the pre-trained and adjusted Transformers BERT and RoBERTa [9,34,35], together with the linguistic, syntactic and statistical characteristics, and the embedding results at the word and sentence level.…”
Section: Resultsmentioning
confidence: 99%
“…The lack of sufficient labeled data and domain and gender agnostic data limit the performance of those approaches . Considering the scarcity of annotated data and the problem of predicting the lexical complexity of single-word and multi-word expressions, (Taya et al, 2021) used an ensemble model over a set of transformer-based model with hand-crafted features to increase the model generalization and robustness. To improve the quality of the sentiment analysis task of lowresource languages such as Bangla, the authors Sultana et al, 2022) proposed aspect-based sentiment analysis using BOW and supervised machine learning techniques and provided two datasets for aspect-based sentiment analysis.…”
Section: Related Workmentioning
confidence: 99%
“…The lack of sufficient labeled data and domain and gender agnostic data limit the performance of those approaches (Islam et al, 2023a). Considering the scarcity of annotated data and the problem of predicting the lexical complexity of single-word and multi-word expressions, (Taya et al, 2021) used an ensemble model over a set of transformer-based model with hand-crafted features to increase the model generalization and robustness. To improve the quality of the sentiment analysis task of lowresource languages such as Bangla, the authors (Rahman and Kumar Dey, 2018; Sultana et al, 2022) proposed aspect-based sentiment analysis using BOW and supervised machine learning techniques and provided two datasets for aspect-based sentiment analysis.…”
Section: Related Workmentioning
confidence: 99%