quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations, it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multithreading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data. . quanteda: An R package for the quantitative analysis of textual data.
Scholars estimating policy positions from political texts typically code words or sentences and then build left-right policy scales based on the relative frequencies of text units coded into different categories. Here we reexamine such scales and propose a theoretically and linguistically superior alternative based on the logarithm of oddsratios. We contrast this scale with the current approach of the Comparative Manifesto Project (CMP), showing that our proposed logit scale avoids widely acknowledged flaws in previous approaches. We validate the new scale using independent expert surveys. Using existing CMP data, we show how to estimate more distinct policy dimensions, for more years, than has been possible before, and make this dataset publicly available. Finally, we draw some conclusions about the future design of coding schemes for political texts.l sq_6 123..156Almost anyone interested in party competition, whether this takes place in legislatures, the electoral arena, or government, needs sooner or later to estimate the policy positions of key political actors, whether these be individual legislators or the political parties to which they affiliate. Indeed, "how to best measure the policy preferences of individual legislators and of legislative parties" (Loewenberg 2008, 499) forms one of the central problems of legislative research. This is particularly true for scholars of comparative legislative research. While in the American settings policy preferences of legislators have been conceptualized as individual-level variables, tight party discipline in many non-American contexts makes it difficult to derive
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.