Targeted advertising is a key characteristic of online as well as traditional-media marketing. However it is very limited in outdoor advertising, that is, performing campaigns by means of billboards in public places. The reason is the lack of information about the interests of the particular passersby, except at very imprecise and aggregate demographic or traffic estimates. In this work we propose a methodology for performing targeted outdoor advertising by leveraging the use of social media. In particular, we use the Twitter social network to gather information about users’ degree of interest in given advertising categories and about the common routes that they follow, characterizing in this way each zone in a given city. Then we use our characterization for recommending physical locations for advertising. Given an advertisement category, we estimate the most promising areas to be selected for the placement of an ad that can maximize its targeted effectiveness. We show that our approach is able to select advertising locations better with respect to a baseline reflecting a current ad-placement policy. To the best of our knowledge this is the first work on offline advertising in urban areas making use of (publicly available) data from social networks
Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the software with respect to the specific framework under consideration may be crucial in order to achieve good performance, especially on very large amounts of data. We choose k -mers counting as a case study for our analysis, and Spark as the framework to implement FastKmer, a novel approach for the extraction of k -mer statistics from large collection of biological sequences, with arbitrary values of k . Results One of the most relevant contributions of FastKmer is the introduction of a module for balancing the statistics aggregation workload over the nodes of a computing cluster, in order to overcome data skew while allowing for a full exploitation of the underlying distributed architecture. We also present the results of a comparative experimental analysis showing that our approach is currently the fastest among the ones based on Big Data technologies, while exhibiting a very good scalability. Conclusions We provide evidence that the usage of technologies such as Hadoop or Spark for the analysis of big datasets of biological sequences is productive only if the architectural details and the peculiar aspects of the considered framework are carefully taken into account for the algorithm design and implementation.
The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster approximate methods. The algorithm of choice used to quickly compute the Jaccard index |A∩B| |A∪B| of two item sets A and B is usually a form of min-hashing. Most min-hashing schemes are maintainable in data streams processing only additions, but none are known to work when facing item-wise deletions. In this paper, we investigate scalable approximation algorithms for rational set similarities, a broad class of similarity measures including Jaccard.Motivated by a result of Chierichetti and Kumar [J. ACM 2015] who showed any rational set similarity S admits a locality sensitive hashing (LSH) scheme if and only if the corresponding distance 1 − S is a metric, we can show that there exists a space efficient summary maintaining a (1±ε) multiplicative approximation to 1−S in dynamic data streams. This in turn also yields a ε additive approximation of the similarity.The existence of these approximations hints at, but does not directly imply a LSH scheme in dynamic data streams. Our second and main contribution now lies in the design of such an LSH scheme maintainable in dynamic data streams. The scheme is space efficient, easy to implement and to the best of our knowledge the first of its kind able to process deletions.
Computer networks are undergoing a phenomenal growth, driven by the rapidly increasing number of nodes constituting the networks. At the same time, the number of security threats on Internet and intranet networks is constantly growing, and the testing and experimentation of cyber defense solutions requires the availability of separate, test environments that best emulate the complexity of a real system. Such environments support the deployment and monitoring of complex mission-driven network scenarios, thus enabling the study of cyber defense strategies under real and controllable traffic and attack scenarios. In this paper, we propose a methodology that makes use of a combination of techniques of network and security assessment, and the use of cloud technologies to build an emulation environment with adjustable degree of affinity with respect to actual reference networks or planned systems. As a byproduct, starting from a specific study case, we collected a dataset consisting of complete network traces comprising benign and malicious traffic, which is feature-rich and publicly available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.