Abedelaziz MOHAISEN†a) , Nonmember, Nam-Su JHO † †b) , Member, Dowon HONG † †c) , Nonmember, and DaeHun NYANG † † †d) , Member SUMMARY Privacy preserving association rule mining algorithms have been designed for discovering the relations between variables in data while maintaining the data privacy. In this article we revise one of the recently introduced schemes for association rule mining using fake transactions (FS). In particular, our analysis shows that the FS scheme has exhaustive storage and high computation requirements for guaranteeing a reasonable level of privacy. We introduce a realistic definition of privacy that benefits from the average case privacy and motivates the study of a weakness in the structure of FS by fake transactions filtering. In order to overcome this problem, we improve the FS scheme by presenting a hybrid scheme that considers both privacy and resources as two concurrent guidelines. Analytical and empirical results show the efficiency and applicability of our proposed scheme. key words: privacy preservation, association rule mining, data sharing, resources efficiency, performance evaluation
IntroductionData mining is a powerful tool for discovering knowledge, such as hidden predictive information, pattens and correlations, from large databases [1]. However, since the data itself may include information that lead to user identification, the privacy preserving data mining (PPDM) has emerged to become of a great interest [2]. In the PPDM settings, not only the accuracy of the data mining algorithms but also the privacy of data are considered essential [3]. Since the first work on PPDM by Agrawal et al. [2], several algorithms have been developed to treat the privacy in several settings. These algorithms are classified into cryptographic and non-cryptographic (randomization-based) algorithms [4]. The cryptographic algorithms for PPDM are [24]. ARM is a well researched method for discovering interesting relations between variables in large databases. When adding the privacy concern to ARM, the privacy preserving association rule mining (PP-ARM) aims to discover such relations between the variables in data while maintaining its privacy. To do so, several algorithms have been introduced including those previously cited in [15]-[20], [25]. Among these works, Rizvi et al introduced MASK for PP-ARM [15]. In MASK (referred as PS), each bit in each transaction is altered into its binary complement with a probability p or kept as it is with a probability 1 − p (see Sect. 3 for details). Accordingly, the mining algorithm is modified so that an approximation of support and confidence are computed over the modified data given p. This algorithm has two advantages: (1) the privacy is quantified based on a sound mathematical definition and, (2) it does not require any memory overhead. However, its shortcoming is that the maximum achievable privacy is bounded (up to 0.89).In another work (referred as FS), Lin et al used fake transactions to anonymize original data transactions for PP-ARM [18]. The FS...