2020
DOI: 10.1017/pan.2020.8
|View full text |Cite
|
Sign up to set email alerts
|

Automated Text Classification of News Articles: A Practical Guide

Abstract: Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
82
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 104 publications
(86 citation statements)
references
References 36 publications
4
82
0
Order By: Relevance
“…These four best practices are hardly original: most of them have been proposed in previous best practice articles (e.g. Grimmer & Stewart, 2013;Barberá, Boydstun, Linn, McMahon, & Nagler, 2020;Van Atteveldt & Peng, 2018). With our empirical findings, this discussion illustrates the importance of these best practices.…”
Section: Resultssupporting
confidence: 58%
See 1 more Smart Citation
“…These four best practices are hardly original: most of them have been proposed in previous best practice articles (e.g. Grimmer & Stewart, 2013;Barberá, Boydstun, Linn, McMahon, & Nagler, 2020;Van Atteveldt & Peng, 2018). With our empirical findings, this discussion illustrates the importance of these best practices.…”
Section: Resultssupporting
confidence: 58%
“…For example, Fogel-Dror, Shenhav, Sheafer, and Van Atteveldt (2018) utilize off-the-shelf LSD in an analysis of sentiment against news entities using a validated, rule-based approach. If one has to hand annotate 1% of the material and that amounts to a few thousand articles, a new study shows that there is more than enough data to train and validate an accurate supervised machine learning model of news sentiment (Barberá et al, 2020). Regardless, all these new applications require heavy human validation.…”
mentioning
confidence: 99%
“…9. As a point of comparison, Barberá et al (2021) report an accuracy of 71% for the classification of sentiment on the level of sentences. 10.…”
Section: Notesmentioning
confidence: 99%
“…This very famously includes data documenting user behavior in digital environments -so-called digital trace data (Freelon, 2014;Golder & Macy, 2014;Howison et al, 2011;Jungherr, 2015). Increasingly, however, other large datasets that are relevant for political communication research have become available digitally, such as large text corpora covering fields as diverse as newspaper coverage (Barberá et al, 2020), literature (Piper, 2018;Underwood, 2019), historical or contemporary parliamentary speeches (Rauh & Schwalbach, 2020), and images (Williams et al, 2020). All these datasets are legitimate objects of computational communication, and it is thus unnecessarily limiting to restrict one's definition of CSS to specific types of datasets.…”
Section: Defining Computational Communication Sciencementioning
confidence: 99%