The paper is devoted to construction of a sentence corpus annotated by the general sentiment into 4 classes (positive, negative, neutral, and mixed), a corpus of phrasemes annotated by the sentiment into 3 classes (positive, negative, and neutral), and a corpus of sentences annotated by the presence or absence of irony. The annotation was done by volunteers within the project “Prepare texts for algorithms” on the portal “People of science”.
The existing knowledge on the domain regarding each task was the basis to develop guidelines for annotators. A technique of statistical analysis of the annotation result based on the distributions and agreement measures of the annotations performed by various annotators was also developed.
For the annotation of sentences by irony and phrasemes by the sentiment the agreement measures were rather high (the full agreement rate of 0.60--0.99), whereas for the annotation of sentences by the general sentiment the agreement was low (the full agreement rate of 0.40), presumably, due to the higher complexity of the task. It was also shown that the results of automatic algorithms of detecting the sentiment of sentences improved by 12–13 % when using a corpus for which all the annotators (from 3 till 5) had the agreement, in comparison with a corpus annotated by only one volunteer.
This study is aimed at building an automated model for business writing assessment, based on 14 rubrics that integrate EFL teacher assessment frameworks and identify expected performance against various criteria (including language, task fulfillment, content knowledge, register, format, and cohesion). We developed algorithms for determining the corresponding numerical features using methods and tools for automatic text analysis. The algorithms are based on a syntactic analysis with the use of dictionaries. The model performance was subsequently evaluated on a corpus of 20 teacher-assessed business letters. Heat maps and UMAP results represent comparison between teachers’ and automated score reports. Results showed no significant discrepancies between teachers’ and automated score reports, yet detected bias in teachers’ reports. Findings suggest that the developed model has proved to be an efficient tool for natural language processing with high interpretability of the results, the roadmap for further improvement and a valid and unbiased alternative to teachers’ assessment. The results may lay the groundwork for developing an automatic students’ language profile. Although the model was specifically designed for business letter assessment, it can be easily adapted for assessing other writing tasks, e.g. by replacing dictionaries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.