2022
DOI: 10.1609/icwsm.v16i1.19377
|View full text |Cite
|
Sign up to set email alerts
|

The Reddit Politosphere: A Large-Scale Text and Network Resource of Online Political Discourse

Abstract: We introduce the Reddit Politosphere, a large-scale resource of online political discourse covering more than 600 political discussion groups over a period of 12 years. It is to the best of our knowledge the largest and ideologically most comprehensive dataset of its type now available. One key feature of the Reddit Politosphere is that it consists of both text and network data, allowing for methodologically-diverse analyses. We describe in detail how we create the Reddit Politosphere, present descriptive stat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 48 publications
0
3
0
Order By: Relevance
“…In this section, we provide various statistical analysis of the dataset in order to validate the various attributes of the data, discuss their variation with respect to time and location, and analyze the correlation between the features. These descriptive statistics provide a summary of the central tendency, variability, and distribution of each variable in the dataset (Hofmann et al, 2022). (Heiny, 2022).…”
Section: Resultsmentioning
confidence: 99%
“…In this section, we provide various statistical analysis of the dataset in order to validate the various attributes of the data, discuss their variation with respect to time and location, and analyze the correlation between the features. These descriptive statistics provide a summary of the central tendency, variability, and distribution of each variable in the dataset (Hofmann et al, 2022). (Heiny, 2022).…”
Section: Resultsmentioning
confidence: 99%
“…Building on the approach of Ribeiro et al ( 2020) and Röttger et al (2021), we had the goal of designing a balanced diagnostic dataset for probing how well models capture the meanings of scalar adverbs. Our primary dataset consists of 960 items, which are based on posts from the year 2015 in the Reddit politosphere dataset introduced in Hofmann et al (2022). This slice represents about 6GB of data from a range of political subreddits (e.g., r/conservative or r/anarchist).…”
Section: Methodsmentioning
confidence: 99%
“…Its diversity (Weninger, Zhu, and Han 2013) and active participation (Choi et al 2015) make Reddit an excellent source for data collection (Jamnik and Lane 2019), particularly on geopolitical issues like the Israel-Hamas conflict. Subreddits allow for focused data gathering, providing insights into public opinion and discourse (Hofmann, Schütze, and Pierrehumbert 2022;Petruzzellis et al 2023). Conversations on Reddit, including submissions and comments, offer valuable, detailed discussions and diverse viewpoints, contributing significantly to research on complex social and political topics by providing real-time engagement and sentiment analysis (Park et al 2021;He, May, and Lerman 2023).…”
Section: Introductionmentioning
confidence: 99%