“…More recently, advanced NLP techniques, such as neural networkbased distributed language representation learning approaches (e.g., word2vec) and transfer learning approaches (e.g., BERT), have been applied to short answer grading [34,44,45]. In massive open online courses (MOOCs), NLP techniques along with classification algorithms (e.g., logistic regression, random forest) have examined data from discussion forums for a wide range of tasks such as predicting students' learning outcomes, sentiment analysis [27], confusion detection [14], and cognitive presence [3,12].…”