“…These experiments relate to a large body of work that considers how preprocessing methods affect the downstream accuracy of various algorithms, ranging from topics in information retrieval (Chaudhari et al, 2015;Patil and Atique, 2013;Beil et al, 2002), text classification and regression (Forman, 2003;Yang and Pedersen, 1997;Vijayarani et al, 2015;Kumar and Harish, 2018;HaCohen-Kerner et al, 2020;Symeonidis et al, 2018;Weller et al, 2020), topic modeling (Blei et al, 2003;Lund et al, 2019;Schofield and Mimno, 2016;Schofield et al, 2017a,b), and even more complex tasks like question answering (Jijkoun et al, 2003;Carvalho et al, 2007) and machine translation (Habash, 2007;Habash and Sadat, 2006;Leusch et al, 2005;Weller et al, 2021;Mehta et al, 2020) to name a few. With the rise of noisy social media, text preprocessing has become important for tasks that use data from sources like Twitter and Reddit (Symeonidis et al, 2018;Singh and Kumari, 2016;Bao et al, 2014;Jianqiang, 2015;Weller and Seppi, 2020;Zirikly et al, 2019;Babanejad et al, 2020).…”