Abstract-The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.
The use of quantitative metrics to gauge the impact of scholarly publications, authors, and disciplines is predicated on the availability of reliable usage and annotation data. Citation and download counts are widely available from digital libraries. However, current annotation systems rely on proprietary labels, refer to journals but not articles or authors, and are manually curated. To address these limitations, we propose a social framework based on crowdsourced annotations of scholars, designed to keep up with the rapidly evolving disciplinary and interdisciplinary landscape. We describe a system called Scholarometer, which provides a service to scholars by computing citation-based impact measures. This creates an incentive for users to provide disciplinary annotations of authors, which in turn can be used to compute disciplinary metrics. We first present the system architecture and several heuristics to deal with noisy bibliographic and annotation data. We report on data sharing and interactive visualization services enabled by Scholarometer. Usage statistics, illustrating the data collected and shared through the framework, suggest that the proposed crowdsourcing approach can be successful. Secondly, we illustrate how the disciplinary bibliometric indicators elicited by Scholarometer allow us to implement for the first time a universal impact measure proposed in the literature. Our evaluation suggests that this metric provides an effective means for comparing scholarly impact across disciplinary boundaries.
The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them over time with the new ones. The evaluation of our framework is carried out by using a dataset of Twitter trending topics. Over a one-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario.
Scholarometer (scholarometer.indiana.edu) is a social tool developed to facilitate citation analysis and help evaluate the impact of authors. The Scholarometer service allows scholars to compute various citation-based impact measures. In exchange, users provide disciplinary annotations of authors, which allow for the computation of discipline-specific statistics and discipline-neutral impact metrics. We present here two improvements of our system. First, we integrated a new universal impact metric hs that uses crowdsourced data to calculate the global rank of a scholar across disciplinary boundaries. Second, improvements made in ambiguous name classification have increased the accuracy from 80% to 87%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.