Social media has been established to bear signals relating to health and well-being states. In this paper, we investigate the potential of social media in characterizing and understanding abstinence from tobacco or alcohol use. While the link between behavior and addiction has been explored in psychology literature, the lack of longitudinal self-reported data on long-term abstinence has challenged addiction research. We leverage the activity spanning almost eight years on two prominent communities on Reddit: StopSmoking and StopDrinking. We use the self-reported “badge” information of nearly a thousand users as gold standard information on their abstinence status to characterize long-term abstinence. We build supervised learning based statistical models that use the linguistic features of the content shared by the users as well as the network structure of their social interactions. Our findings indicate that long-term abstinence from smoking or drinking (~one year) can be distinguished from short-term abstinence (~40 days) with 85% accuracy. We further show that language and interaction on social media offer powerful cues towards characterizing these addiction-related health outcomes. We discuss the implications of our findings in social media and health research, and in the role of social media as a platform for positive behavior change and therapy.
Local Differential Privacy (LDP) is popularly used in practice for privacy-preserving data collection. Although existing LDP protocols offer high data utility for large user populations (100,000 or more users), they perform poorly in scenarios with small user populations (such as those in the cybersecurity domain) and lack perturbation mechanisms that are effective for both ordinal and non-ordinal item sequences while protecting sequence length and content simultaneously. In this paper, we address the small user population problem by introducing the concept of Condensed Local Differential Privacy (CLDP) as a specialization of LDP, and develop a suite of CLDP protocols that offer desirable statistical utility while preserving privacy. Our protocols support different types of client data, ranging from ordinal data types in finite metric spaces (numeric malware infection statistics), to non-ordinal items (OS versions, transaction categories), and to sequences of ordinal and non-ordinal items. Extensive experiments are conducted on multiple datasets, including datasets that are an order of magnitude smaller than those used in existing approaches, which show that proposed CLDP protocols yield higher utility compared to existing LDP protocols. Furthermore, case studies with Symantec datasets demonstrate that our protocols outperform existing protocols in key cybersecurity-focused tasks of detecting ransomware outbreaks, identifying targeted and vulnerable OSs, and inspecting suspicious activities on infected machines.
We study the problem of determining the proper aggregation granularity for a stream of time-stamped edges. Such streams are used to build time-evolving networks, which are subsequently used to study topics such as network growth. Currently, aggregation lengths are chosen arbitrarily, based on intuition or convenience. We describe ADAGE, which detects the appropriate aggregation intervals from streaming edges and outputs a sequence of structurally mature graphs. We demonstrate the value of ADAGE in automatically finding the appropriate aggregation intervals on edge streams for belief propagation to detect malicious files and machines.
Given a large graph with millions of nodes and edges, say a social network where both its nodes and edges have multiple attributes (e.g., job titles, tie strengths), how to quickly find subgraphs of interest (e.g., a ring of businessmen with strong ties)? We present MAGE, a scalable, multicore subgraph matching approach that supports expressive queries over large, richly-attributed graphs. Our major contributions include: (1) MAGE supports graphs with both node and edge attributes (most existing approaches handle either one, but not both); (2) it supports expressive queries, allowing multiple attributes on an edge, wildcards as attribute values (i.e., match any permissible values), and attributes with continuous values; and (3) it is scalable, supporting graphs with several hundred million edges. We demonstrate MAGE's effectiveness and scalability via extensive experiments on large real and synthetic graphs, such as a Google+ social network with 460 million edges.
Abstract-With an average of 80% length reduction, the URL shorteners have become the norm for sharing URLs on Twitter, mainly due to the 140-character limit per message. Unfortunately, spammers have also adopted the URL shorteners to camouflage and improve the user click-through of their spam URLs. In this paper, we measure the misuse of the short URLs and analyze the characteristics of the spam and non-spam short URLs. We utilize these measurements to enable the detection of spam short URLs. To achieve this, we collected short URLs from Twitter and retrieved their click traffic data from Bitly, a popular URL shortening system. We first investigate the creators of over 600,000 Bitly short URLs to characterize short URL spammers. We then analyze the click traffic generated from various countries and referrers, and determine the top click sources for spam and non-spam short URLs. Our results show that the majority of the clicks are from direct sources and that the spammers utilize popular websites to attract more attention by cross-posting the links. We then use the click traffic data to classify the short URLs into spam vs. non-spam and compare the performance of the selected classifiers on the dataset. We determine that the Random Tree algorithm achieves the best performance with an accuracy of 90.81% and an F-measure value of 0.913.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.