2018
DOI: 10.1177/0894439318758389
|View full text |Cite
|
Sign up to set email alerts
|

Proportional Classification Revisited: Automatic Content Analysis of Political Manifestos Using Active Learning

Abstract: Supervised machine learning is a promising methodological innovation for content analysis (CA) to approach the challenge of ever-growing amounts of text in the digital era. Social scientists have pointed to accurate measurement of category proportions and trends in large collections as their primary goal. Proportional classification, for example, allows for time-series analysis of diachronic data sets or correlation of categories with text-external covariates. We evaluate the performance of two common approach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
14
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(16 citation statements)
references
References 36 publications
1
14
0
1
Order By: Relevance
“…Table I shows the performance of unlabeled data stopping methods. SP, one of the most widely applicable and easyto-implement methods, has leading performance, consistent with past findings [13], [14], [21]. Accordingly, we use SP as representative of state of the art unlabeled data stopping methods in the rest of our experiments.…”
Section: Stopping Methods Parameterssupporting
confidence: 74%
See 1 more Smart Citation
“…Table I shows the performance of unlabeled data stopping methods. SP, one of the most widely applicable and easyto-implement methods, has leading performance, consistent with past findings [13], [14], [21]. Accordingly, we use SP as representative of state of the art unlabeled data stopping methods in the rest of our experiments.…”
Section: Stopping Methods Parameterssupporting
confidence: 74%
“…In [13], [14], [21], and in our results in section V, SP is shown to be a leading stopping method that uses unlabeled data. Therefore, in the rest of this paper, we use the SP stopping method as representative of the state of the art of stopping methods that use unlabeled data.…”
Section: B Stopping Methods That Use Unlabeled Datamentioning
confidence: 99%
“…This erroneous classification can lead to incorrect associations being observed between the assigned categories and the outcomes of interest [26], thereby biasing inferences drawn from the data collected [27], often substantially [28], or decreasing the power of the study [29]. As highlighted by Kloos et al [9], misclassification bias occurs in a broad range of applications, including epidemiology [30], political science [31], and official statistics [32]. The objective of these applications is to shift focus from minimizing loss functions at the level of individual predictions to the level of aggregated predictions.…”
Section: Related Workmentioning
confidence: 99%
“…A machine classifier learns textual features (especially word occurrences and their combinations) which suggest the existence of a certain class. This allows for an active learning scenario, where new text passages fitting to a certain category are identified within a collection automatically (Wiedemann, 2018). In an iterated process, automatic suggestions can be corrected by a human annotator to improve the classification model.…”
Section: Analysis Featuresmentioning
confidence: 99%
“…Lexicometric methods such as keyword extraction, frequency-and co-occurrence analysis are already established and widely used in social science text analysis. Machine learning methods such as data-driven clustering of document collections using topic models (Blei, 2012;Stier et al, 2017) or the training of automatic classification methods for coding texts (Lemke et al, 2015;Stier et al, 2018;Posch et al, 2015;Wiedemann, 2018) begin outreaching into this field. As such applications of text mining help to combine qualitative and quantitative analysis perspectives, they are becoming increasingly relevant as a so-called "mixed methods" approach in the social sciences.…”
Section: Introductionmentioning
confidence: 99%