Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigating measures and their societal impact. Understanding such discourse, its evolution, and interdependencies with real-world events or (mis)information can foster valuable insights. On the other hand, such corpora are crucial facilitators for computational methods addressing tasks such as sentiment analysis, event detection, or entity recognition. However, obtaining, archiving, and semantically annotating large amounts of tweets is costly. In this paper, we describe TweetsCOV19, a publicly available knowledge base of currently more than 8 million tweets, spanning October 2019-April 2020. Metadata about the tweets as well as extracted entities, hashtags, user mentions, sentiments, and URLs are exposed using established RDF/S vocabularies, providing an unprecedented knowledge base for a range of knowledge discovery tasks. Next to a description of the dataset and its extraction and annotation process, we present an initial analysis and use cases of the corpus.
Social tags are known to be a valuable source of information for image retrieval and organization. However, contrary to the conventional document retrieval, rich tag frequency information in social sharing systems, such as Flickr, is not available, thus we cannot directly use the tag frequency (analogous to the term frequency in a document) to represent the relevance of tags. Many heuristic approaches have been proposed to address this problem, among which the well-known neighbor voting based approaches are the most effective methods. The basic assumption of these methods is that a tag is considered as relevant to the visual content of a target image if this tag is also used to annotate the visual neighbor images of the target image by lots of different users. The main limitation of these approaches is that they treat the voting power of each neighbor image either equally or simply based on its visual similarity. In this paper, we cast the social tag relevance learning problem as an adaptive teleportation random walk process on the voting graph.In particular, we model the relationships among images by constructing a voting graph, and then propose an adaptive teleportation random walk, in which a confidence factor is introduced to control the teleportation probability, on the voting graph. Through this process, direct and indirect relationships among images can be explored to cooperatively estimate the tag relevance. To quantify the performance of our approach, we compare it with state-of-the-art methods on two publicly available datasets (NUS-WIDE and MIR Flickr). The results indicate that our method achieves substantial performance gains on these datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.