The sharing of databases either within or across organizations raises the possibility of unintentionally revealing sensitive relationships contained in them. Recent advances in data-mining technology have increased the chances of such disclosure. Consequently, firms that share their databases might choose to hide these sensitive relationships prior to sharing. Ideally, the approach used to hide relationships should be impervious to as many data-mining techniques as possible, while minimizing the resulting distortion to the database. This paper focuses on frequent item sets, the identification of which forms a critical initial step in a variety of data-mining tasks. It presents an optimal approach for hiding sensitive item sets, while keeping the number of modified transactions to a minimum. The approach is particularly attractive as it easily handles databases with millions of transactions. Results from extensive tests conducted on publicly available real data and data generated using IBM’s synthetic data generator indicate that the approach presented is very effective, optimally solving problems involving millions of transactions in a few seconds.
Crowdsourcing contests have emerged as an innovative way for firms to solve business problems by acquiring ideas from participants external to the firm. As the number of participants on crowdsourcing contest platforms has increased, so has the number of tasks that are open at any time. This has made it difficult for solvers to identify tasks in which to participate. We present a framework to recommend tasks to solvers who wish to participate in crowdsourcing contests. The existence of competition among solvers is an important and unique aspect of this environment, and our framework considers the competition a solver would face in each open task. As winning a task depends on performance, we identify a theory of performance and reinforce it with theories from learning, motivation, and tournaments. This augmented theory of performance guides us to variables specific to crowdsourcing contests that could impact a solver's winning probability. We use these variables as input into various probability prediction models adapted to our context, and make recommendations based on the probability or the expected payoff of the solver winning an open task. We validate our framework using data from a real crowdsourcing platform. The recommender system is shown to have the potential of improving the success rates of solvers across all abilities. Recommendations have to be made for open tasks and we find that the relative rankings of tasks at similar stages of their time lines remain remarkably consistent when the tasks close. Further, we show that deploying such a system should benefit not only the solvers, but also the seekers and the platform itself.
A common problem encountered in paper-production facilities is that of allocating customer orders to machines so as to minimize the total cost of production. It can be formulated as a dual-angular integer program, with identical machines inducing symmetry. While the potential advantages of decomposing large mathematical programs into smaller subproblems have long been recognized, the solution of decomposable integer programs remains extremely difficult. Symmetry intensifies the difficulty. This paper develops an approach, based on the construction of tight subproblem bounds, to solve decomposable dual-angular integer programs and successfully applies it to solve the problem from the paper industry. This method is of particular interest as it significantly reduces the impact of symmetry.
The need to hide sensitive information before sharing databases has long been recognized. In the context of data mining, sensitive information often takes the form of itemsets that need to be suppressed before the data is released. This paper considers the problem of minimizing the number of nonsensitive itemsets lost while concealing sensitive ones. It is shown to be an intractably large version of an NP-hard problem. Consequently, a two-phased procedure that involves the solution of two smaller NP-hard problems is proposed as a practical and effective alternative. In the first phase, a procedure to solve a sanitization problem identifies how the support for sensitive itemsets could be eliminated from a specific transaction by removing the fewest number of items from it. This leads to a modified frequent itemset hiding problem, where transactions to be sanitized are selected such that the number of nonsensitive itemsets lost, while concealing sensitive ones, is minimized. Heuristic procedures are developed for these problems using intuition derived from their integer programming formulations. Results from computational experiments conducted on a publicly available retail data set and three large data sets generated using IBM's synthetic data generator indicate that these approaches are very effective, solving problems involving up to 10 million transactions in a short period of time. The results also show that the process of sanitization has considerable bearing on the quality of solutions obtained.sensitive itemsets, hiding patterns, itemset mining, sanitization
H istorically, the use of peer-to-peer (P2P) networks has been limited primarily to user-initiated exchanges of (mostly music) files over the Internet. This traditional view of P2P networks is changing, however, and the use of P2P networks has been suggested for delivering general-purpose content over the Web (or corporate intranets), even in real time. We analyze sharing in a P2P community in this new context under three different congestion measures: delay, jitter, and packet loss. Sharing is important to study in the presence of congestion because most existing research on P2P networks views congestion in the network as a relatively insignificant criterion. However, when delivering general-purpose content, congestion and its relationship to sharing is a critical factor that influences end-user performance. This paper looks at P2P networks from this new perspective by explicitly considering the effects of congestion on user incentives for sharing. We also propose a simple incentive mechanism that induces socially optimal sharing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.