2020
DOI: 10.3390/ijerph17030864
|View full text |Cite
|
Sign up to set email alerts
|

The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure

Abstract: Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 21 publications
(20 citation statements)
references
References 20 publications
0
20
0
Order By: Relevance
“…The majority of these are now integrated into easy-to-use automatic kits available for Microsoft Excel software or similar (e.g., Real Statistics and Zaiontz, 2021 ; XLSTAT, 2021 ), which is a great advantage in terms of operational speed. However, when dealing with platforms such as Twitter, Reddit, Instagram, or Facebook, the collection and analysis of posts is still laborious: indeed, it requires the use of databases already extracted (which limits the power of investigation) or application programming interfaces (APIs) and all datasets must be suitably processed before use ( Kim et al, 2020 ). Therefore, while all of the above methods are essential and powerful for historical data analysis, more immediate and rapid tools are equally necessary for quasi-real-time infoveillance.…”
Section: Introductionmentioning
confidence: 99%
“…The majority of these are now integrated into easy-to-use automatic kits available for Microsoft Excel software or similar (e.g., Real Statistics and Zaiontz, 2021 ; XLSTAT, 2021 ), which is a great advantage in terms of operational speed. However, when dealing with platforms such as Twitter, Reddit, Instagram, or Facebook, the collection and analysis of posts is still laborious: indeed, it requires the use of databases already extracted (which limits the power of investigation) or application programming interfaces (APIs) and all datasets must be suitably processed before use ( Kim et al, 2020 ). Therefore, while all of the above methods are essential and powerful for historical data analysis, more immediate and rapid tools are equally necessary for quasi-real-time infoveillance.…”
Section: Introductionmentioning
confidence: 99%
“…Publicly available data from Twitter were accessed via Twitter’s Streaming API [ 46 , 47 ] between 9th March and 15th June 2020, retrieving tweets whose Twitter place field was in Wales. The API returns a random sample of the total tweets from the specified area, up to a maximum of 1% of the total worldwide traffic [ 46 ]. The tweets returned by the API contain both the text of the tweet and associated meta-data.…”
Section: Approachmentioning
confidence: 99%
“…Recent research has compared the performance of samples gathered from each of these APIs with a focus on keywords, users, content, and Tweet volume (Tromble et al [ 26 ], Morstatter et al [ 18 ], Wang et al [ 27 ], Pfeffer et al [ 19 ], Kim et al [ 13 , 14 ]). Wang et al [ 27 ] verified that the Streaming API and Decahose produces samples that are approximately 1% and 10% of the entire Twitter corpus, but Pfeffer et al [ 19 ] provided cautionary evidence that samples from these APIs may not be random samples and may over-represent certain users or groups.…”
Section: Introductionmentioning
confidence: 99%