2014
DOI: 10.1093/pan/mpt030
|View full text |Cite
|
Sign up to set email alerts
|

Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines

Abstract: Due in large part to the proliferation of digitized text, much of it available for little or no cost from the Internet, political science research has experienced a substantial increase in the number of data sets and large-n research initiatives. As the ability to collect detailed information on events of interest expands, so does the need to efficiently sort through the volumes of available information. Automated document classification presents a particularly attractive methodology for accomplishing this tas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
52
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 59 publications
(52 citation statements)
references
References 56 publications
0
52
0
Order By: Relevance
“…In the supervised case, which is relatively rare in political science applications (though see e.g. Laver, Benoit and Garry, 2003;Hopkins and King, 2010;Diermeier et al, 2011;D'Orazio et al, 2014;King, Lam and Roberts, 2017), researchers have a set of hand-labeled training documents and they wish to learn the relationship between the features (e.g. terms) those texts contain and the labels they were given.…”
Section: Text Preprocessing As Feature Selection: Supervised Vs Unsupmentioning
confidence: 99%
“…In the supervised case, which is relatively rare in political science applications (though see e.g. Laver, Benoit and Garry, 2003;Hopkins and King, 2010;Diermeier et al, 2011;D'Orazio et al, 2014;King, Lam and Roberts, 2017), researchers have a set of hand-labeled training documents and they wish to learn the relationship between the features (e.g. terms) those texts contain and the labels they were given.…”
Section: Text Preprocessing As Feature Selection: Supervised Vs Unsupmentioning
confidence: 99%
“…Unsupervised statistical learning tools also exist, which are useful for revealing other patterns within the human rights document corpus [7,[13][14][15][16] without reference to the existing coded human rights variables, which we describe below. These tools are more generally part of the emergent field of computational social science or "big data" analysis [17][18][19] of which there are several recent examples in the study of human rights [1,10,[20][21][22][23] and many other examples from political science and social science more generally [11,[24][25][26][27][28].…”
Section: Document-term Matricesmentioning
confidence: 99%
“…Even with the feature-reduction steps described in the previous section, the vector space representation of our textual data is extremely sparse and high dimensional. This is problematic, because the efficiency of many supervised learning algorithms, such as neural networks, boosted trees, and random forests, tends to degrade with the dimensionality of the data (Caruana et al, 2008) We chose to use SVMs, which have been shown to be particularly good at dealing with sparse, high-dimensional data structures such as ours [D'Orazio et al, 2014;Joachims, 1998). 3 The goal of our SVMs here is to produce a model from labeled and processed textual data that we can then apply to generate labels for country-year observations for which we do not have labels but do have processed textual data.…”
Section: Classificationmentioning
confidence: 99%