This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency-based text analysis methods by dividing text into predicates with an identified subject and optional source, extracting the statements and actions of (political) actors as mentioned in the text. The content of these predicates can be analyzed using existing frequency-based methods, allowing for the analysis of actions, issue positions and framing by different actors within a single text. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis applied to the results of the clause analysis can show differences in citation and framing patterns between U.S. and English-language Chinese coverage of this war.
A crucial challenge in measuring how text represents an entity is the need to associate each representative expression with a relevant entity to generate meaningful results. Common solutions to this problem are usually based on proximity methods that require a large corpus to reach reasonable levels of accuracy. We show how such methods for the association between an entity and a representation yield a high percentage of false positives at the expression level and low validity at the document level. We introduce a solution that combines syntactic parsing, semantic role labeling logic, and a machine learning approach-the role-based association method. To test our method, we compared it with prevalent methods of association on the news coverage of two entities of interest-the State of Israel and the Palestinian Authority. We found that the role-based association method is more accurate at the expression and the document levels.
The collaborative effort of a theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. Additivity also enables the reuse of previous efforts or the merging of separate research projects, thereby increasing the accessibility of such methods and the ability of the discipline to create shareable content analysis capabilities. This paper proposes a weakly supervised topic analysis method, which combines a low-cost unsupervised method to compile a training-set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method is a solid starting point for a low-cost and additive solution for a large-scale topic analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.