2008
DOI: 10.1080/19331680801975367
|View full text |Cite
|
Sign up to set email alerts
|

Computer-Assisted Topic Classification for Mixed-Methods Social Science Research

Abstract: Social scientists interested in mixed-methods research have traditionally turned to human annotators to classify the documents or events used in their analyses. The rapid growth of digitized government documents in recent years presents new opportunities for research but also new challenges. With more and more data coming online, relying on human annotators becomes prohibitively expensive for many tasks. For researchers interested in saving time and money while maintaining confidence in their results, we show … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
103
0
2

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 106 publications
(110 citation statements)
references
References 23 publications
5
103
0
2
Order By: Relevance
“…Heuristically, as long as the classifiers are accurate and diverse, combining the classifiers will improve accuracy (Jurafsky and Martin 2009). But ensembles are also useful for other reasons, including: increased out-of-sample stability and the ability to capture complex functional forms with relatively simple classifiers (Dietterich 2000;Hillard, Purpura, and Wilkerson 2008). Schemes for developing ensembles are diverse.…”
Section: Applying a Supervised Learning Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Heuristically, as long as the classifiers are accurate and diverse, combining the classifiers will improve accuracy (Jurafsky and Martin 2009). But ensembles are also useful for other reasons, including: increased out-of-sample stability and the ability to capture complex functional forms with relatively simple classifiers (Dietterich 2000;Hillard, Purpura, and Wilkerson 2008). Schemes for developing ensembles are diverse.…”
Section: Applying a Supervised Learning Modelmentioning
confidence: 99%
“…In fact, a recent debate in political science casts unsupervised and supervised as competitor methods (e.g., Hillard, Purpura, and Wilkerson 2008;Quinn et al 2010). This debate is misplaced: supervised and unsupervised methods are different models with different objectives.…”
Section: Discovering Categories and Topicsmentioning
confidence: 99%
“…There are two general approaches to classify the topics of text: either the topics are known in advance and constitute a static set of categories, for example (Hillard et al, 2008), or they are unknown in advance and dynamically created depending on the data, as in (Quinn et al, 2010) (see also (Grimmer and Stewart, 2013) and (Sebastiani, 2002) for an overview). In our scenario, we assume a common set of topics over several data sources, namely the party manifestos and transcripts of speeches in our case.…”
Section: Determining Topics In Speechesmentioning
confidence: 99%
“…The variety of benefits to computer-assisted text analysis over hand coding include the natural improvements in speed, the ability to process high volumes of text, and the consistency of treatment of all parts of the corpus (Grimmer & King, 2011;Hillard, Purpura, & Wilkerson, 2008;Lowe & Benoit, 2013). Humans often struggle with the development of complicated coding schemes (Quinn, Monroe, Colaresi, Crespin, & Radev, 2010), and there is some experimental evidence to suggest that humans judge clusters produced by automated methods to be more semantically coherent than even an taxonomy created by the documents' authors (Grimmer & King, 2011;Grimmer & Stewart, 2013).…”
Section: Introduction To Text Analysismentioning
confidence: 99%