Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.
Many hoped that social networking sites would allow for the open exchange of information and a revival of the public sphere. Unfortunately, conversations on social media are often toxic and not conducive to healthy political discussions. Twitter, the most widely used social network for political discussions, doubled the limit of characters in a tweet in November 2017, which provided an opportunity to study the effect of technological affordances on political discussions using a discontinuous time series design. Using supervised and unsupervised natural language processing methods, we analyzed 358,242 tweet replies to U.S. politicians from January 2017 to March 2018. We show that doubling the permissible length of a tweet led to less uncivil, more polite, and more constructive discussions online. However, the declining trend in the empathy and respectfulness of these tweets raises concerns about the implications of the changing norms for the quality of political deliberation.
This study introduces and evaluates the robustness of different volumetric, sentiment, and social network approaches to predict the elections in three Asian countries-Malaysia, India, and Pakistan from Twitter posts. We find that predictive power of social media performs well for India and Pakistan but is not effective for Malaysia. Overall, we find that it is useful to consider the recency of Twitter posts while using it to predict a real outcome, such as an election result. Sentiment information mined using machine learning models was the most accurate predictor of election outcomes. Social network information is stable despite sudden surges in political discussions, for e.g. around electionsrelated news events. Methods combining sentiment and volume information, or sentiment and social network information, are effective at predicting smaller vote shares, for e.g. vote shares in the case of independent candidates and regional parties. We conclude with a detailed discussion on the caveats of social media analysis for predicting real-world outcomes and recommendations for future work.
PurposeThe purpose of this study is to analyze the macro‐level discourse structure of literature reviews found in information science journal papers, and to identify different styles of literature review writing. Although there have been several studies of human abstracting, there are hardly any studies of how authors construct literature reviews.Design/methodology/approachThis study is carried out in the context of a project to develop a summarization system to generate literature reviews automatically. A coding scheme was developed to annotate the high‐level organization of literature reviews, focusing on the types of information. Two sets of annotations were used to check inter‐coder reliability.FindingsIt was found that literature reviews are written in two distinctive styles, with different discourse structures. Descriptive literature reviews summarize individual papers/studies and provide more information on each study, such as research methods, results and interpretation. Integrative literature reviews provide fewer details of individual papers/studies, but focus on ideas and results extracted from these papers. They provide critical summaries of topics, and have a more complex structure of topics and sub‐topics. The reviewer's voice is also more dominant.Originality/valueThe coding scheme is useful for annotating the macro‐level discourse structure of literature reviews, and can be used for studying literature reviews in other fields. The basic characteristics of two styles of literature review writing are identified. The results have provided a foundation for further studies of literature reviews – to identify discourse relations and rhetorical functions employed in literature reviews, and their linguistic expressions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.