“…Both of these treat the text as a collection of exchangeable tokens, a token being either a word or a phrase. The first approach is the so-called bag-of-words approach that requires the researcher to specify a dictionary of positive and negative tokens McDonald, 2011, 2014;Chen et al, 2014;Heston and Sinha, 2017;Renault, 2017;Jiang et al, 2019). The second approach, which we refer to as the tokenization approach, does not require the researcher to explicitly specify any prior beliefs regarding the positivity or negativity of individual tokens but rather uses manually labeled text to identify relevant tokens (Taddy, 2013b,a;Mitra and Gilbert, 2014;Ranco et al, 2015;Oliveira et al, 2016;Renault, 2017).…”