Improving Short Answer Grading Using Transformer-Based Pre-training

Sung, Chul; Dhamecha, Tejas I.; Mukhi, Nirmal

doi:10.1007/978-3-030-23204-7_39

Cited by 87 publications

(60 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We leave a multi-task formulation of our application setting for future work. Sung et al (2019) demonstrated state-of-the-art performance for similarity-based content scoring on the SemEval benchmark dataset . In this work, we use pre-trained transformer models for instance-based content scoring (cf.…”

Section: Related Workmentioning

confidence: 97%

Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

2020

View full text Add to dashboard Cite

Readability assessment aims to automatically classify text by the level appropriate for learning readers. Traditional approaches to this task utilize a variety of linguistically motivated features paired with simple machine learning models. More recent methods have improved performance by discarding these features and utilizing deep learning models. However, it is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further. This paper combines these two approaches with the goal of improving overall model performance and addressing this question. Evaluating on two large readability corpora, we find that, given sufficient training data, augmenting deep learning models with linguistically motivated features does not improve state-of-the-art performance. Our results provide preliminary evidence for the hypothesis that the state-of-theart deep learning models represent linguistic features of the text related to readability. Future research on the nature of representations formed in these models can shed light on the learned features and their relations to linguistically motivated ones hypothesized in traditional approaches.

show abstract

Section: Related Workmentioning

confidence: 97%

Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

2020

View full text Add to dashboard Cite

show abstract

“…More recently, advanced NLP techniques, such as neural networkbased distributed language representation learning approaches (e.g., word2vec) and transfer learning approaches (e.g., BERT), have been applied to short answer grading [34,44,45]. In massive open online courses (MOOCs), NLP techniques along with classification algorithms (e.g., logistic regression, random forest) have examined data from discussion forums for a wide range of tasks such as predicting students' learning outcomes, sentiment analysis [27], confusion detection [14], and cognitive presence [3,12].…”

Section: Natural Language Processing In Learning Analyticsmentioning

confidence: 99%

Detecting Disruptive Talk in Student Chat-Based Discussion within Collaborative Game-Based Learning Environments

Park

Sohn

Mott

et al. 2021

LAK21: 11th International Learning Analytics and Knowledge Conference

View full text Add to dashboard Cite

Collaborative game-based learning environments offer significant promise for creating engaging group learning experiences. Online chat plays a pivotal role in these environments by providing students with a means to freely communicate during problem solving. These chat-based discussions and negotiations support the coordination of students' in-game learning activities. However, this freedom of expression comes with the possibility that some students might engage in undesirable communicative behavior. A key challenge posed by collaborative game-based learning environments is how to reliably detect disruptive talk that purposefully disrupt team dynamics and problem-solving interactions. Detecting disruptive talk during collaborative game-based learning is particularly important because if it is allowed to persist, it can generate frustration and significantly impede the learning process for students. This paper analyzes disruptive talk in a collaborative game-based learning environment for middle school science education to investigate how such behaviors influence students' learning outcomes and varies across gender and students' prior knowledge. We present a disruptive talk detection framework that automatically detects disruptive talk in chat-based group conversations. We further investigate both classic machine learning and deep learning models for the framework utilizing a range of dialogue representations as well as supplementary information such as student gender. Findings show that long short-term memory network (LSTM)-based disruptive talk detection models outperform competitive baseline models, indicating that the LSTM-based disruptive talk detection framework

show abstract

“…Specifically, bidirectional encoder representations from transformers (BERT), a pre-trained multilayer bidirectional transformer network (Vaswani et al, 2017) released by the Google AI Language team, have achieved state-of-the-art results in various NLP tasks, such as question answering, named entity recognition, natural language inference, and text classification (Devlin et al, 2019). BERT was also applied to AES (Rodriguez et al, 2019) and automated short-answer grading Sung et al, 2019) in 2019, and demonstrated good performance.…”

Section: Transformer-based Modelmentioning

confidence: 99%

Neural Automated Essay Scoring Incorporating Handcrafted Features

Uto¹,

Xie²,

Ueno³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Automated essay scoring (AES) is the task of automatically assigning scores to essays as an alternative to grading by human raters. Conventional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks (DNNs) to obviate the need for feature engineering. Furthermore, hybrid methods that integrate handcrafted features in a DNN-AES model have been recently developed and have achieved state-of-the-art accuracy. One of the most popular hybrid methods is formulated as a DNN-AES model with an additional recurrent neural network (RNN) that processes a sequence of handcrafted sentencelevel features. However, this method has the following problems: 1) It cannot incorporate effective essay-level features developed in previous AES research. 2) It greatly increases the numbers of model parameters and tuning parameters, increasing the difficulty of model training. 3) It has an additional RNN to process sentence-level features, enabling extension to various DNN-AES models complex. To resolve these problems, we propose a new hybrid method that integrates handcrafted essay-level features into a DNN-AES model. Specifically, our method concatenates handcrafted essay-level features to a distributed essay representation vector, which is obtained from an intermediate layer of a DNN-AES model. Our method is a simple DNN-AES extension, but significantly improves scoring accuracy.

show abstract

Improving Short Answer Grading Using Transformer-Based Pre-training

Cited by 87 publications

References 19 publications

Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Detecting Disruptive Talk in Student Chat-Based Discussion within Collaborative Game-Based Learning Environments

Neural Automated Essay Scoring Incorporating Handcrafted Features

Contact Info

Product

Resources

About