Automated text analytic techniques have taken on an increasingly important role in the study of parties and political speech. Researchers have studied manifestos, speeches in parliament, and debates at party national meetings. These methods have demonstrated substantial promise for measuring latent characteristics of texts. In application, however, scaling models require a large number of decisions on the part of the researcher that likely hold substantive implications for the analysis. Past researchers proposed discussion of these implications, but there is no clear prescription or systematic examination of these choices with the goal of establishing a set of best practices based on their implications for speeches at parties' national meetings in a comparative setting. We examine the implications of these choices with data from intra-party meetings in Germany, Italy, the Netherlands, and prime minister speeches in Denmark. We conclude with considerations for those undertaking political text analyses.2 Automated text analysis methods offer substantial opportunities to develop and test political science theories. These tools have been used to explain diverse topics such as MPs' behavior in parliament (Schonhardt-Bailey 2006;Klemmensen et al. 2007;Quinn et al. 2010 Lucas et al. 2015). In using these methods, analysts face substantial choices prior to the implementation of the primary analysis. For example, researchers often reduce linguistic complexity by removing uninformative stopwords and by stemming documents, and facilitate model estimation by removing rare (or very common) terms. Although these practices are common and uncontroversial in computer science and linguistics (e.g. Hollink et al. 2004;Manning et al. 2008), the potential substantive implications of these choices for frequently used scaling models applied to political texts are less clear (see Denny and Spirling 2016 for a similar approach). Differences between languages further complicate the formulation of best practices, especially for those engaged in comparative, cross national research (but see Lucas et al. 2015).In this paper, we examine the consequences of these choices for frequently used models that estimate the latent positions of political actors from spoken and written texts (e.g.
Wordfish). In particular, we review these practices and consider their implications for the estimated position of a document and the estimated uncertainty of that position. We propose that decisions related to the processing of political documents influence the consistency and the reliability of the results obtained from automated text analysis. Practices designed to 3 reduce linguistic complexity reduce the uncertainty associated with individual speeches by isolating the most informative words. We demonstrate these characteristics using party leader speeches and written texts of internal party debates in a variety of languages from Germany, Italy, and the Netherlands, and annual prime minister speeches from Denmark. We conclude that researchers sho...