Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 2015
DOI: 10.1145/2808797.2809328
|View full text |Cite
|
Sign up to set email alerts
|

Twitter Population Sample Bias and its impact on predictive outcomes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(7 citation statements)
references
References 8 publications
0
6
0
1
Order By: Relevance
“…Our study provides some reason for hope: After controlling for demographics, social media's association with vote choice and political attitudes greatly declines. This suggests social media data could be used for studying public opinion and forecasting if the data is appropriately weighted using demographics (which some work has already begun to (Filho et al, 2015)) and political attitudes or adjusted using an approach such as multilevel regression and post-stratification (Wang et al, 2014). 7 Although social media data provides numerous opportunities for political science, it is vital to remember that Twitter and Facebook are not representative of the general population.…”
Section: Resultsmentioning
confidence: 99%
“…Our study provides some reason for hope: After controlling for demographics, social media's association with vote choice and political attitudes greatly declines. This suggests social media data could be used for studying public opinion and forecasting if the data is appropriately weighted using demographics (which some work has already begun to (Filho et al, 2015)) and political attitudes or adjusted using an approach such as multilevel regression and post-stratification (Wang et al, 2014). 7 Although social media data provides numerous opportunities for political science, it is vital to remember that Twitter and Facebook are not representative of the general population.…”
Section: Resultsmentioning
confidence: 99%
“…In the same research it has been detected that Twitter users differ from the general population in terms of demographics, political attitudes and political behaviour. Since biases in these relevant dimensions can be significant, a possible solution for using social media data for studying public opinion and forecasting elections requires the data to be appropriately weighted using demographics, as suggested by [22]. Future research should be seek to overcome some of these limitations and integrate the temporal component into unified framework proposed.…”
Section: Discussionmentioning
confidence: 99%
“…The use of the public Twitter streaming API to collect data pre-filtered only for tweets including latitude and longitude coordinates represent a subset of all tweets posted during the time frame of the study. There exists the potential for sampling bias associated with different Twitter APIs that are not representative of all Twitter data (e.g., Firehose data), and data filtered only for geocoded data may omit many conversations from collegeaged populations about topics, such as smoking, which may be linked to college-related user groups (see "Limitations" section for more details) (50). Though resulting in a much smaller volume of data, our approach nevertheless allows for detection of tweets in specific geospatial bounds at the high resolution of latitude and longitude coordinates in the state of California.…”
Section: Data Collectionmentioning
confidence: 99%
“…This method of data collection may have introduced bias in the types of tweets collected, thereby limiting the generalizability of findings as the majority of Twitter users do not geolocate their posts. Potential sampling biases for Twitter include oversampling for certain geographic areas (e.g., there are a higher number of U.S. Twitter users than other countries), filtering for specific features (e.g., language, location), and the limitations of the Twitter public streaming API (used in this study) in lieu of other data collection approaches (e.g., Twitter REST and SEARCH APIs) (50). Future studies should examine the use of multiple Twitter APIs to generate a more representative Twitter dataset (including different strategies for filtering, demographic characterization, and purposeful user sampling) and compliment findings with other traditional sources of data (e.g., survey data, focus groups, clinical records, etc.)…”
Section: Limitationsmentioning
confidence: 99%