2015
DOI: 10.1145/2746366
|View full text |Cite
|
Sign up to set email alerts
|

Should We Use the Sample? Analyzing Datasets Sampled from Twitter’s Stream API

Abstract: Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availabi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(49 citation statements)
references
References 28 publications
0
49
0
Order By: Relevance
“…This also avoids overestimating the importance of infrequent items [61]. We motivate the choice of thresholds for the odds ratio computation in Section 3.4.…”
Section: Methodsmentioning
confidence: 99%
“…This also avoids overestimating the importance of infrequent items [61]. We motivate the choice of thresholds for the odds ratio computation in Section 3.4.…”
Section: Methodsmentioning
confidence: 99%
“…To collect tweets, we created a Twitter app account to gain access to the limited API data Twitter makes freely accessible to all registered app users (Twitter Developer, n.d.). We were able to conduct specific searches for tweets incorporating the #deleteuber hashtag, following similar procedures from extant research utilizing social media as data (Boyd & Crawford, 2012;Chae, 2015;Giglietto & Selva, 2014;Humphreys, Gill, Krishnamurthy, & Newbury, 2013;Kim, Heo, Choi, & Park, 2014;Wang, Callan, & Zheng, 2015).…”
Section: Data Collectionmentioning
confidence: 99%
“…However, Twitter states a sample size range between 1% and 10% for tweets. Studies that measured this sample size reported a sample size between 0.95% and 9.6% for tweets and between 10% and 45% for users [4,5]. Wang et.…”
Section: Technical Recording Limitations For the Analysis To Be Consimentioning
confidence: 99%
“…concluded that "the sample datasets truthfully reflect the daily and hourly activity patterns of the Twitter users. (...) Even with a very small sampling ratio (i.e., 0.95%), the sample datasets (...) preserve the relative importance (i.e., frequency of appearance) of the content terms" [5].…”
Section: Technical Recording Limitations For the Analysis To Be Consimentioning
confidence: 99%
See 1 more Smart Citation