“…Processing and cleaning text data for semi-automated classification can require varying amounts of efforts and techniques, however, a set of typically used techniques has already been established: this set includes spellchecking (Quillo-Espino et al, 2018 ), lowercasing (Foster et al, 2020 ), stemming (Jivani, 2011 ; Bao et al, 2014 ; Singh and Gupta, 2017 ), lemmatization (Bao et al, 2014 ; Banks et al, 2018 ; Symeonidis et al, 2018 ; Foster et al, 2020 ), stopword removal (Foster et al, 2020 ), and different ways of text enrichment/adding of linguistic features (Foster et al, 2020 ). We will systematically review these options below and justify our choices (for an overview, see Table 2 ).…”