“…Several recent surveys (Acheampong et al, 2020;Alswaidan and Menai, 2020) and studies ( Öhman et al, 2020;Bostan et al, 2020;Bostan and Klinger, 2018;Schuff et al, 2017) list previous work on emotion detection from texts and emphasise their differences in type of emotion taxonomy, task (single-label or multi-label), size of the dataset, text genre, granularity, topics, system architectures, and best results obtained with systems for automatic detection of emotions in texts. However, none of the studies focussed on assessing the quality of benchmark datasets, or the influence of methods used for obtaining gold labels on the results of systems for automatic emotion detection from texts.…”