Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications 2020
DOI: 10.18653/v1/2020.bea-1.1
|View full text |Cite
|
Sign up to set email alerts
|

Linguistic Features for Readability Assessment

Abstract: Readability assessment aims to automatically classify text by the level appropriate for learning readers. Traditional approaches to this task utilize a variety of linguistically motivated features paired with simple machine learning models. More recent methods have improved performance by discarding these features and utilizing deep learning models. However, it is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further. This paper combines these … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
51
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 56 publications
(54 citation statements)
references
References 18 publications
2
51
0
1
Order By: Relevance
“…Overall, the BERT embedding is a powerful fea- ture for predicting document readability on Cambridge Readabilty Dataset. Ablating the BERT embeddings (Table 3) significantly decreases the document accuracy (−0.112) which is consistent with the previous work (Martinc et al, 2019;Deutsch et al, 2020) that BERT being one of the bestperforming method for predicting document readability on one of the datasets they used, and HAN performing relatively low due to not using the BERT embeddings.…”
Section: Ablation Study On Featuressupporting
confidence: 88%
See 1 more Smart Citation
“…Overall, the BERT embedding is a powerful fea- ture for predicting document readability on Cambridge Readabilty Dataset. Ablating the BERT embeddings (Table 3) significantly decreases the document accuracy (−0.112) which is consistent with the previous work (Martinc et al, 2019;Deutsch et al, 2020) that BERT being one of the bestperforming method for predicting document readability on one of the datasets they used, and HAN performing relatively low due to not using the BERT embeddings.…”
Section: Ablation Study On Featuressupporting
confidence: 88%
“…Baselines We compare our method against methods used in previous work (Feng et al, 2010;Vajjala and Meurers, 2012;Martinc et al, 2019;Deutsch et al, 2020): (1) logistic regression for classification (LR cls), ( 2) linear regression for regression (LR regr), (3) Gradient Boosted Decision Tree (GBDT), and (4) Hierarchical Attention Network (Yang et al, 2016, HAN), which is reported as one of the state-of-the-art methods in readability assessment for documents (Martinc et al, 2019;Deutsch et al, 2020).…”
Section: Methodsmentioning
confidence: 99%
“…12 In terms of weighted F 1 -score, both strategies Table 5 The results of the supervised approach to readability in terms of accuracy, weighted precision, weighted recall, and weighted F 1 -score for the three neural network classifiers and methods from the literature. that use BERT (utilizing the BERT classifier directly or feeding BERT features to the SVM classifier as in Deutsch, Jasbi, and Shieber [2020]) seem to return similar results. Finally, in terms of QWK, BERT achieves a very high score of 95.27% and the other two tested classifiers obtain a good QWK score close to 90%.…”
Section: Supervised Experimental Resultsmentioning
confidence: 86%
“…The results of supervised readability assessment using different architectures of deep neural networks are presented in Table 5, together with the state-of-the-art baseline results from the related work (Xia, Kochmar, and Briscoe 2016;Filighera, Steuer, and Rensing 2019;Deutsch, Jasbi, and Shieber 2020). We only present the best result reported by each of the baseline studies; the only exception is Deutsch, Jasbi, and Shieber (2020), for which we present two results, SVM-BF (SVM with BERT features) and SVM-HF (SVM with HAN features) that proved the best on the WeeBit and Newsela corpora, respectively.…”
Section: Supervised Experimental Resultsmentioning
confidence: 99%
See 1 more Smart Citation