Social media and microblogging platforms generally contain elements of figurative and nonliteral language, including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not be possible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language have not been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilized to ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can be deceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles. In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets (i.e. linguistic, psychological, personal, spoken categories, and punctuation). In the classification phase, accuracy rates of five supervised learning algorithms (i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm) with three widely utilized ensemble methods (i.e. AdaBoost, bagging, and random subspace) have been considered. Based on the results, we concluded that the random forest algorithm yielded the highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-based architectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism.
Massive open online courses (MOOCs) are recent and widely studied distance learning approaches aimed at providing learning material to learners from geographically dispersed locations without age, gender, or race-related constraints. MOOCs generally enriched by discussion forums to provide interactions among students, professors, and teaching assistants. MOOC discussion forum posts provide feedback regarding the students' learning processes, social interactions, and concerns. The purpose of our research is to present a How to cite this article: Onan A, Toçoğlu MA. Weighted word embeddings and clustering-based identification of question topics in MOOC discussion forum posts.
This study presents a new dataset to be used in emotion extraction studies in Turkish text. We consider emotion extraction as a supervised text classification problem, which thereby requires a dataset for the training process. To satisfy this requirement, we aim to create a new dataset containing data for the six emotion categories: happiness, fear, anger, sadness, disgust and surprise. To gather this dataset, we conducted a survey and collected 27,350 entries from 4709 individuals. In the next step, we performed a validation process in which annotators validated each entry one by one by assigning a related emotion category. As a result of this process, we obtained two datasets, one raw and the other validated. Subsequently, we generated four versions of these two datasets using two different stemming methods and then modelled them using a vector space model. Then, we ran machine learning algorithms, including complement naive Bayes (CNB), random forest (RF), decision tree C4.5 (J48) and an updated version of support vector machines (SVMs), on the models to calculate the accuracy, precision, recall and F-measure values. Based on the results we obtained, we concluded that the SVM classifier yielded the highest performance value and that the models trained with a validated dataset provide more accurate results than the models trained with a non-validated dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.