Dominated by quantitative data science techniques, social media data analysis often fails to incorporate the surrounding context, conversation, and metadata that allows for more complete, accurate, and informed analysis. Here we describe the development of a scalable data collection infrastructure to interrogate massive amounts of tweets-including complete user conversations-to perform contextualized social media analysis. Additionally, we discuss the nuances of location metadata and incorporate it when available to situate the user conversations within geographic context through an interactive map. The map also spatially clusters tweets to identify important locations and movement between them, illuminating specific behavior, like evacuating before a hurricane. We share performance details, the promising results of concurrent research utilizing this infrastructure, and discuss the challenges and ethics of using context-rich datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.