We investigate a problem at the intersection of machine learning and security: training-set attacks on machine learners. In such attacks an attacker contaminates the training data so that a specific learning algorithm would produce a model profitable to the attacker. Understanding training-set attacks is important as more intelligent agents (e.g. spam filters and robots) are equipped with learning capability and can potentially be hacked via data they receive from the environment. This paper identifies the optimal training-set attack on a broad family of machine learners. First we show that optimal training-set attack can be formulated as a bilevel optimization problem. Then we show that for machine learners with certain Karush-Kuhn-Tucker conditions we can solve the bilevel problem efficiently using gradient methods on an implicit function. As examples, we demonstrate optimal training-set attacks on Support VectorMachines, logistic regression, and linear regression with extensive experiments. Finally, we discuss potential defenses against such attacks.
Abstract-The first step to deal with the significant issue of air pollution in China and elsewhere in the world is to monitor it. While more physical monitoring stations are built, current coverage is limited to large cities with most other places undermonitored. In this paper we propose a complementary approach to monitor Air Quality Index (AQI): using machine learning models to estimate AQI from social media posts. We propose a series of progressively more sophisticated machine learning models, culminating in a Markov Random Field model that utilizes the text content in social media as well as the spatiotemporal correlation among cities and days. Our extensive experiments on Sina Weibo data from 108 cities during a one-month period demonstrate the accurate AQI prediction performance of our approach.
Sponsored search is the primary business for today's commercial search engines. Accurate prediction of the Click-Through Rate (CTR) for ads is key to displaying relevant ads to users. In this paper, we systematically study the two kinds of contextual factors influencing the CTR: 1) In micro factors, we focus on the factors for mainline ads, including ad depth, query diversity, ad interaction. 2) In macro factors, we try to understand the correlations of clicks between organic search and sponsored search. Based on this data analysis, we propose novel click models which harvest these new explored factors. To the best of our knowledge, this is the first paper to examine and model the effects of the above contextual factors in sponsored search. Extensive experiments on large-scale real-world datasets show that by incorporating these contextual factors, our novel click models can outperform state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.