Learning Automated Essay Scoring Models Using Item-Response-Theory-Based Scores to Decrease Effects of Rater Biases

Uto, Masaki; Okano, Masashi

doi:10.1109/tlt.2022.3145352

Cited by 13 publications

(5 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model integrated the capacities of BERT and RCNN to achieve better performance by capturing contextual semantic word‐level features and fusing sentence‐level features, which was consistent with the research (Ein‐Dor et al, 2020; Lai et al, 2015). Notably, the pre‐trained deep learning model can be expanded to other automated classification tasks with acceptable performance (Uto & Okano, 2021).…”

Section: Discussionmentioning

confidence: 99%

“…Technically, there is a wealth of semantic information available from peer grades and textual feedback (Uto & Okano, 2021), providing an opportunity to utilize AI as an effective solution for detecting reliability. AI is a broad concept that encompasses traditional machine learning, deep learning, emerging generative AI, and more (Zehner & Hahnel, 2023).…”

Section: Literature Reviewmentioning

confidence: 99%

“…Notably, the pre-trained deep learning model can be expanded to other automated classification tasks with acceptable performance (Uto & Okano, 2021).…”

Section: Effectiveness Of the Automated Grading Model With Reliabilit...mentioning

confidence: 99%

See 2 more Smart Citations

Exploring an effective automated grading model with reliability detection for large‐scale online peer assessment

Lin,

Yan,

Zhao

2024

Computer Assisted Learning

View full text Add to dashboard Cite

BackgroundPeer assessment has played an important role in large‐scale online learning, as it helps promote the effectiveness of learners' online learning. However, with the emergence of numerical grades and textual feedback generated by peers, it is necessary to detect the reliability of the large amount of peer assessment data, and then develop an effective automated grading model to analyse the data and predict learners' learning results.ObjectivesThe present study aimed to propose an automated grading model with reliability detection.MethodsA total of 109,327 instances of peer assessment from a large‐scale teacher online learning program were tested in the experiments. The reliability detection approach included three steps: recurrent convolutional neural networks (RCNN) was used to detect grade consistency, bidirectional encoder representations from transformers (BERT) was used to detect text originality, and long short‐term memory (LSTM) was used to detect grade‐text consistency. Furthermore, the automated grading was designed with the BERT‐RCNN model.Results and ConclusionsThe effectiveness of the automated grading model with reliability detection was shown. For reliability detection, RCNN performed best in detecting grade consistency with an accuracy rate of 0.889, BERT performed best in detecting text originality with an improvement of 4.47% compared to the benchmark model, and LSTM performed best with an accuracy rate of 0.883. Moreover, the automated grading model with reliability detection achieved good performance, with an accuracy rate of 0.89. Compared to the absence of reliability detection, it increased by 12.1%.ImplicationsThe results strongly suggest that the automated grading model with reliability detection for large‐scale peer assessment is effective, with the following implications: (1) The introduction of reliability detection is necessary to help filter out low reliability data in peer assessment, thus promoting effective automated grading results. (2) This solution could assist assessors in adjusting the exclusion threshold of peer assessment reliability, providing a controllable automated grading tool to reducing manual workload with high quality. (3) This solution could shift educational institutions from labour‐intensive grading procedures to a more efficient educational assessment pattern, allowing for more investment in supporting instructors and learners to improve the quality of peer feedback.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

Exploring an effective automated grading model with reliability detection for large‐scale online peer assessment

Lin,

Yan,

Zhao

2024

Computer Assisted Learning

View full text Add to dashboard Cite

show abstract

“…The scores awarded by two teams of human raters, on the other hand, had a strong correlation. Item response theory (IRT) based essay scoring has been recently introduced in [22]. This model tries to reduce the effect of rater biases on the performance of essay scoring system.…”

Section: Literature Reviewmentioning

confidence: 99%

A comparison of various machine learning algorithms and execution of flask deployment on essay grading

Kotha

Gaddam

Siddenki

et al. 2023

IJECE

View full text Add to dashboard Cite

<p><span lang="EN-US">Students’ performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.</span></p>

show abstract

“…As technology-powered advances are being incorporated into large-scale writing assessments, automated essay scoring (AES) has received increasing attention, offering a viable alternative to the traditionally time-intensive and laborious manual grading processes [1][2][3]. Due to remarkable advances in corpus linguistics [4,5], natural language processing (NLP) [6,7], and deep learning [3,8,9], AES has the benefits of improved consistency, reduced subjectivity, and constructive feedback by exploiting extensive linguistic features or incorporating cutting-edging algorithms [10][11][12][13][14]. Given the importance of AES, it is unsurprising that the investigation into the power of linguistic features characterizing writing quality has become a critical focus within the domains of writing assessment and instruction in the past five decades.…”

Section: Introductionmentioning

confidence: 99%

Incorporating Fine-Grained Linguistic Features and Explainable AI into Multi-Dimensional Automated Writing Assessment

Tang,

Chen,

Lin

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

With the flourishing development of corpus linguistics and technological revolutions in the AI-powered age, automated essay scoring (AES) models have been intensively developed. However, the intricate relationship between linguistic features and different constructs of writing quality has yet to be thoroughly investigated. The present study harnessed computational analytic tools and Principal Component Analysis (PCA) to distill and refine linguistic indicators for model construction. Findings revealed that both micro-features and their combination with aggregated features robustly described writing quality over aggregated features alone. Linear and non-linear models were thus developed to explore the associations between linguistic features and different constructs of writing quality. The non-linear AES model with Random Forest Regression demonstrated superior performance over other benchmark models. Furthermore, SHapley Additive exPlanations (SHAP) was employed to pinpoint the most powerful linguistic features for each rating trait, enhancing the model’s transparency through explainable AI (XAI). These insights hold the potential to substantially facilitate the advancement of multi-dimensional approaches toward writing assessment and instruction.

show abstract

Learning Automated Essay Scoring Models Using Item-Response-Theory-Based Scores to Decrease Effects of Rater Biases

Cited by 13 publications

References 68 publications

Exploring an effective automated grading model with reliability detection for large‐scale online peer assessment

Exploring an effective automated grading model with reliability detection for large‐scale online peer assessment

A comparison of various machine learning algorithms and execution of flask deployment on essay grading

Incorporating Fine-Grained Linguistic Features and Explainable AI into Multi-Dimensional Automated Writing Assessment

Contact Info

Product

Resources

About