Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications 2020
DOI: 10.18653/v1/2020.bea-1.15
|View full text |Cite
|
Sign up to set email alerts
|

Should You Fine-Tune BERT for Automated Essay Scoring?

Abstract: Most natural language processing research now recommends large Transformer-based models with fine-tuning for supervised classification tasks; older strategies like bag-ofwords features and linear models have fallen out of favor. Here we investigate whether, in automated essay scoring (AES) research, deep neural models are an appropriate technological choice. We find that fine-tuning BERT produces similar performance to classical models at significant additional cost. We argue that while state-of-the-art strate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
66
1
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 82 publications
(70 citation statements)
references
References 50 publications
2
66
1
1
Order By: Relevance
“…Although the approach of [Fonseca et al 2018] achieved better results in both metrics for each competence (C1 to C5), these results are not fit for summative student assessment, as usually for the AES field, threshold values between 0.6 and 0.8 QWK are used as a floor for testing purposes [Mayfield and Black 2020]. Furthermore, the method of [Fonseca et al 2018], which achieved 75.20% in the QWK metric in their corpus, reached only 51% in the Essay-BR.…”
Section: Experiments and Resultsmentioning
confidence: 92%
“…Although the approach of [Fonseca et al 2018] achieved better results in both metrics for each competence (C1 to C5), these results are not fit for summative student assessment, as usually for the AES field, threshold values between 0.6 and 0.8 QWK are used as a floor for testing purposes [Mayfield and Black 2020]. Furthermore, the method of [Fonseca et al 2018], which achieved 75.20% in the QWK metric in their corpus, reached only 51% in the Essay-BR.…”
Section: Experiments and Resultsmentioning
confidence: 92%
“…Then, the embedding representation w t corresponding to w t is calculable as a dot product w t = A ⋅ w t . (Taghipour and Ng 2016;Alikaniotis et al 2016) -Hierarchical representation models (Dong and Zhang 2016;Dong et al 2017), -Coherence models (Tay et al 2018;Li et al 2018;Farag et al 2018;Mesgar and Strube 2018;Yang and Zhong 2021), -BERT-based models (Nadeem et al 2019;Rodriguez et al 2019;Yang et al 2020;Mayfield and Black 2020), -Hybrid models (Dasgupta et al 2018; -Robust model (Uto and Okano 2020)…”
Section: Rnn-based Modelmentioning
confidence: 99%
“…Similarly, [30] has proposed a BERT architecture for the AES task. The authors have utilized the pretrained BERT embedding and then apply the fine-tune.…”
Section: A Supervised Aesmentioning
confidence: 99%