JUST-BLUE at SemEval-2021 Task 1: Predicting Lexical Complexity using BERT and RoBERTa Pre-trained Language Models

Yaseen, Tuqa Bani; Ismail, Qusai; Al-Omari, Sarah; Al-Sobh, Eslam; Abdullah, Malak

doi:10.18653/v1/2021.semeval-1.85

Cited by 9 publications

(15 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We evaluate the results of our system by applying the regression metrics in the execution of the supervised learning algorithms, specifically MAE, MSE, RMSE, and R2. We emphasize that we apply the methodologies of the winning teams, which are based on the application of language models based on the pre-trained and adjusted Transformers BERT and RoBERTa [9,34,35], together with the linguistic, syntactic and statistical characteristics, and the embedding results at the word and sentence level.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Combining Transformer Embeddings with Linguistic Features for Complex Word Identification

2022

View full text Add to dashboard Cite

Identifying which words present in a text may be difficult to understand by common readers is a well-known subtask in text complexity analysis. The advent of deep language models has also established the new state-of-the-art in this task by means of end-to-end semi-supervised (pre-trained) and downstream training of, mainly, transformer-based neural networks. Nevertheless, the usefulness of traditional linguistic features in combination with neural encodings is worth exploring, as the computational cost needed for training and running such networks is becoming more and more relevant with energy-saving constraints. This study explores lexical complexity prediction (LCP) by combining pre-trained and adjusted transformer networks with different types of traditional linguistic features. We apply these features over classical machine learning classifiers. Our best results are obtained by applying Support Vector Machines on an English corpus in an LCP task solved as a regression problem. The results show that linguistic features can be useful in LCP tasks and may improve the performance of deep learning systems.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Deep learning models are significantly improved over "shallow" machine learning models with the advent of transfer learning and pre-trained language models. The BERT and XLM-RoBERTa pre-trained deep learning language models are considered to be at the forefront of many NLP tasks [9].…”

Section: Introductionmentioning

confidence: 99%

Combining Transformer Embeddings with Linguistic Features for Complex Word Identification

2022

View full text Add to dashboard Cite

show abstract

“…Just Blue by Yaseen et al [149], achieved the highest Pearson's Correlation at LCP-2021's sub-task 1 of 0.7886 [118]. It was inspired by the prior state-of-the-art performance of ensemble-based models together with the recent headway in various NLP-related tasks made by transformers [149].…”

Section: 31mentioning

confidence: 99%

“…Pan et al [99] contributed their model's good performance in both sub-tasks to its use of multiple transformers and training strategies. With model diversity also being an influential f actor i n regards t o J ust B lue's high performance [149], it would appear that current state-of-the-art LCP systems consist of an ensemble of differing transformers-based models.…”

Section: 31mentioning

confidence: 99%

Lexical Complexity Prediction: An Overview

2023

View full text Add to dashboard Cite

The occurrence of unknown words in texts significantly hinders reading comprehension. To improve accessibility for specific target populations, computational modelling has been applied to identify complex words in texts and substitute them for simpler alternatives. In this paper, we present an overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data. We survey relevant approaches to this problem which include traditional machine learning classifiers (e.g. SVMs, logistic regression) and deep neural networks as well as a variety of features, such as those inspired by literature in psycholinguistics as well as word frequency, word length, and many others. Furthermore, we introduce readers to past competitions and available datasets created on this topic. Finally, we include brief sections on applications of lexical complexity prediction, such as readability and text simplification, together with related studies on languages other than English.

show abstract

“…Like this team, the use of contextual embedding models stood out as a fundamental part of the presented systems. Such is the case of the JUST BLUE team [138] that leverages context information extracted from BERT and RoBERTa models, achieving the highest "Pearson's Correlation" score in the first task. Similarly, the RG_PA team [139] performs an assembly of RoBERTa models in its classification, obtaining the second highest Pearson's Correlation score in the second task.…”

Section: Substitute Ranking (Sr)mentioning

confidence: 99%

Designing and Evaluating a User Interface for People with Cognitive Disabilities

Moreno

Alarcón

Martı́nez

2021

Proceedings of the XXI International Conference on Human Computer Interaction

View full text Add to dashboard Cite

The Internet has come a long way in recent years, contributing to the proliferation of large volumes of digitally available information. Through user interfaces we can access these contents, however, they are not accessible to everyone. The main users affected are people with disabilities, who are already a considerable number, but accessibility barriers affect a wide range of user groups and contexts of use in accessing digital information. Some of these barriers are caused by language inaccessibility when texts contain long sentences, unusual words and complex linguistic structures. These accessibility barriers directly affect people with cognitive disabilities. ResumenInternet ha avanzado mucho en los últimos años contribuyendo a la proliferación de grandes volúmenes de información disponible digitalmente. A través de interfaces de usuario podemos acceder a estos contenidos, sin embargo, estos no son accesibles a todas las personas. Los usuarios afectados principalmente son las personas con discapacidad siendo ya un número considerable, pero las barreras de accesibilidad afectan a un gran rango de grupos de usuarios y contextos de uso en el acceso a la información digital. Algunas de estas barreras son causadas por la inaccesibilidad al lenguaje cuando los textos contienen oraciones largas, palabras inusuales y estructuras lingüísticas complejas. Estas barreras de accesibilidad afectan directamente a las personas con discapacidad cognitiva.Con el fin de hacer el contenido textual más accesible, existen iniciativas como las pautas de Lectura Fácil, las pautas de Lenguaje Claro y algunas de las pautas de Accesibilidad al Contenido en la Web (WCAG) específicas para el lenguaje. Estas pautas proporcionan documentación, pero no especifican métodos para cumplir con los requisitos implícitos en estas pautas de manera sistemática. Para obtener una solución, los métodos de la disciplina del Procesamiento del Lenguaje Natural (PLN) pueden dar un soporte para alcanzar la conformidad con las pautas de accesibilidad cognitiva relativas al lenguaje La tarea de la simplificación de textos del PLN tiene como objetivo reducir la complejidad lingüística de un texto desde una perspectiva sintáctica y léxica, siendo esta última el enfoque principal de esta Tesis. En este sentido, un espacio de solución es identificar en un texto qué palabras son complejas o poco comunes, y en el caso de que sí hubiera, proporcionar un sinónimo más usual y sencillo, junto con una definición sencilla, todo ello orientado a las personas con discapacidad cognitiva.Con tal meta, en esta Tesis, se presenta el estudio, análisis, diseño y desarrollo de una arquitectura, métodos PLN, recursos y herramientas para la simplificación léxica de textos para el idioma español en un dominio genérico en el ámbito de la accesibilidad cognitiva. Para lograr esto, se estudia cada uno de los pasos presentes en los procesos de simplificación léxica, junto con métodos para la desambiguación del sentido de las palabras. Como contribución, diferentes tipos de word embeddi...

show abstract

JUST-BLUE at SemEval-2021 Task 1: Predicting Lexical Complexity using BERT and RoBERTa Pre-trained Language Models

Cited by 9 publications

References 12 publications

Combining Transformer Embeddings with Linguistic Features for Complex Word Identification

Combining Transformer Embeddings with Linguistic Features for Complex Word Identification

Lexical Complexity Prediction: An Overview

Designing and Evaluating a User Interface for People with Cognitive Disabilities

Contact Info

Product

Resources

About