As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset's geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets' design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively.
The rise of social media platforms provides an unbounded, infinitely rich source of aggregate knowledge of the world around us, both historic and real-time, from a human perspective. The greatest challenge we face is how to process and understand this raw and unstructured data, go beyond individual observations and see the “big picture”—the domain of Situation Awareness. We provide an extensive survey of Artificial Intelligence research, focusing on microblog social media data with applications to Situation Awareness, that gives the seminal work and state-of-the-art approaches across six thematic areas: Crime , Disasters , Finance , Physical Environment , Politics , and Health and Population . We provide a novel, unified methodological perspective, identify key results and challenges, and present ongoing research directions.
During disaster events such as floods, landslides, earthquakes, tsunamis, fire hazards, etc., social media platforms provide easy and timely access to information regarding the ongoing crisis events and thereby become an essential vehicle of information sharing. During such events, great amounts of such socially generated data becomes available, which can be accessed and processed to extract situational awareness insights. These insights, in turn, can be used to enhance the effectiveness and efficiency of disaster response in order to minimize the loss of lives and damage to property. People actively use social platforms like Facebook and Twitter to post information related to crisis events. Further, these platforms provide people the location and safety status of their family and friends during such events. Twitter, the microblogging platform, witnesses thousands of informally written tweets during crisis events, and since it provides high-level APIs to access its near real-time feed, it has become the primary source of data for researchers. It is generally observed that there is an exponential burst in the number of tweets during an ongoing crisis event. This sudden burst makes the task of monitoring, identifying, and processing each tweet virtually impossible for a human. However, such voluminous data can be processed using various machine learning and natural language processing techniques in coordination with a certain level of human interventions. This paper is focused on designing a semi-automated artificial intelligence-based classifier, which can classify the plethora of disaster-related tweets into various categories such as community needs, loss of lives, damage.
Cricket, especially the Twenty20 format, has maximum uncertainty, where a single over can completely change the momentum of the game. With millions of people following the Indian Premier League (IPL), developing a model for predicting the outcome of its matches is a real-world problem. A cricket match depends upon various factors, and in this work, the factors which significantly influence the outcome of a Twenty20 cricket match are identified. Each players performance in the field is considered to find out the overall weight (relative strength) of the team. A multivariate regression based solution is proposed to calculate points of each player in the league and the overall weight of a team is computed based on the past performance of the players who have appeared most for the team. Finally, a dataset was modeled based on the identified seven factors which influence the outcome of an IPL match. Six machine learning models were trained and used for predicting the outcome of each 2018 IPL match, 15 minutes before the gameplay, immediately after the toss. The prediction results are impressive. The problems with the dataset and how the accuracy of the classifier can be improved further is discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.