The objective of this study is to propose and test a hybrid machine learning pipeline to uncover the unfolding of disaster events corresponding to different locations from social media posts during disasters. Effective disaster response and recovery require a comprehensive understanding of disaster situations, i.e., unfolding of disaster events and geographic distribution of the disruptions. Existing studies have employed machine learning methods to conduct coarse-grained event detection and analyze the geographical location information from geotagged social media data. However, only a very small fraction of the entire set of social media data includes geotagged information, which may not directly correspond to events described in the content of posts. In addition, the coarse-grained information detected by existing approaches is tokenbased, which does not provide sufficient information for situation awareness. Hence, the detection of location and finer-grained event information could significantly improve the utility, credibility, and interpretability of social media data for situation awareness. To address these limitations, this study proposed a hybrid machine learning pipeline that makes use of all relevant tweets to uncover the evolution of disaster events across different locations. The pipeline integrates Named Entity Recognition for detecting locations mentioned in the posts, location fusion approach to extract coordinates of the locations and remove noise information, fine-tuned BERT model for classifying posts with humanitarian categories, and graph-based clustering to identify credible situational information. The application of the study is demonstrated using the data set collected from Twitter during the 2017 Hurricane Harvey in Houston. The results show the capability of the proposed hybrid pipeline for automated mapping of events across time and space from social media posts with considerable accuracy. The findings also suggest that the potential for forensic analysis of disasters using mapped events and their evolution, and based on the variation of social media attention to different locations in disasters. Hence, this method could provide a useful tool to support emergency managers, public officials, residents, first responders, and other stakeholders in rapid situation awareness across time and space. INDEX TERMS Machine learning pipeline, social media, disasters, automated mapping.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.