Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.834
|View full text |Cite
|
Sign up to set email alerts
|

Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features

Abstract: We report two essential improvements in readability assessment: 1. three novel features in advanced semantics and 2. the timely evidence that traditional ML models (e.g. Random Forest, using handcrafted features) can combine with transformers (e.g. RoBERTa) to augment model performance. First, we explore suitable transformers and traditional ML models. Then, we extract 255 handcrafted linguistic features using self-developed extraction software. Finally, we assemble those to create several hybrid models, achie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(47 citation statements)
references
References 65 publications
0
46
0
1
Order By: Relevance
“…It would also be interesting to assess the effects of other domain-general psychological factors and emotions in L2 acquisition and learning. Another possible area of future research would be to examine the text readability thoroughly by using machine learning models ( Lee et al, 2021 ). The current study particularly adopted Flesch-Kincaid Grade Level in the correlation, regression and SEM analysis.…”
Section: Discussionmentioning
confidence: 99%
“…It would also be interesting to assess the effects of other domain-general psychological factors and emotions in L2 acquisition and learning. Another possible area of future research would be to examine the text readability thoroughly by using machine learning models ( Lee et al, 2021 ). The current study particularly adopted Flesch-Kincaid Grade Level in the correlation, regression and SEM analysis.…”
Section: Discussionmentioning
confidence: 99%
“…We used the Welch t-test (Welch, 1947) with Bonferroni correction resulting to an alpha level of 0.0167 (0.05/3). For the variables of interest, we extracted over 160 linguistic features using the LingFeat tool by Lee et al (2021)…”
Section: Difference In Prompt and Continuation Complexitiesmentioning
confidence: 99%
“…In this study, we formally introduce the task of uniform complexity for text generation or UCTG in an open-ended narrative generation setting using the WRITINGPROMPTS dataset from Reddit (Fan et al, 2018). To cover a wide range of measures used for approximating linguistic complexity, we extracted over 160 features drawing concepts from age-of-acquisition in developmental studies, word and phrase level part-of-speech, discourse from entity mentions, and formula-based readability indices (Lee et al, 2021). We compare text continuations from three models: humans, an off-the-shelf GPT-model, and a finetuned GPT-2 model.…”
Section: Introductionmentioning
confidence: 99%
“…Several attempts have been made to measure readability, i.e. (a) eye movements, (b) word difficulty, (c) semantic richness (Lee et al 2021), (d) N-gram analysis (Xia et al 2016) and (e) cognitivemotivated features (Feng et al 2009). Readability has been measured through a variety of metrics such as word length, sentence length.…”
Section: Previous Studiesmentioning
confidence: 99%