2021
DOI: 10.1126/sciadv.abe6534
|View full text |Cite
|
Sign up to set email alerts
|

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

Abstract: In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2

Relationship

6
3

Authors

Journals

citations
Cited by 27 publications
(20 citation statements)
references
References 46 publications
0
20
0
Order By: Relevance
“…We use the first and last name of each victim in our dataset to create a set of two word phrases, or 2-grams. For each name, we query its frequency and rank over time using the Storywrangler API 2 [32]. Storywrangler uses a 10% sample of English tweets to measure how often words and phrases (also known generally as ngrams 3 ) are used on Twitter.…”
Section: Mentions Of Victims' Names On Twittermentioning
confidence: 99%
See 1 more Smart Citation
“…We use the first and last name of each victim in our dataset to create a set of two word phrases, or 2-grams. For each name, we query its frequency and rank over time using the Storywrangler API 2 [32]. Storywrangler uses a 10% sample of English tweets to measure how often words and phrases (also known generally as ngrams 3 ) are used on Twitter.…”
Section: Mentions Of Victims' Names On Twittermentioning
confidence: 99%
“…The long-term increase in the rate of retweets is confounded by other factors though, such as changes to Twitter's design and algorithmic curation. To account for this, we define the relative social amplifica-tion [32,36] as…”
Section: Measures Of Attention and Amplificationmentioning
confidence: 99%
“…We draw on a collection of around 10% of all tweets starting in 2008. We take all English language tweets [ 35 , 36 ] matching the word ‘Trump’ from 2015/01/01 on. We ignore case and accept matches of ‘Trump’ at any location of a tweet (e.g., ‘@RealDonaldTrump’ matches).…”
Section: Methodsmentioning
confidence: 99%
“…Notably, language is an evolving sociotechnical phenomenon. New words and phrases are created constantly, especially on social media (Alshaabi et al, 2021a ). Word usage changes over time.…”
Section: Introductionmentioning
confidence: 99%