Available data may differ from true data in many cases due to sensing errors, especially for the Internet of Things (IoT). Although privacy-preserving data mining has been widely studied during the last decade, little attention has been paid to data values containing errors. Differential privacy, which is the de facto standard privacy metric, can be achieved by adding noise to a target value that must be protected. However, if the target value already contains errors, there is no reason to add extra noise. In this paper, a novel privacy model called true-value-based differential privacy (TDP) is proposed. This model applies traditional differential privacy to the "true value" unknown by the data owner or anonymizer but not to the "measured value" containing errors. Based on TDP, the amount of noise added by differential privacy techniques can be reduced by approximately 20% by our solution. As a result, the error of generated histograms can be reduced by 40.4% and 29.6% on average according to mean square error and Jensen-Shannon divergence, respectively.We validate this result on synthetic and five real data sets. Moreover, we proved that the privacy protection level does not decrease as long as the measurement error is not overestimated.
INDEX TERMSData mining, Data privacy, Differentnial privacy, Internet of Things I. INTRODUCTION 1 Significant amounts of IoT data are generated every day by 2 many different sensors, such as thermal cameras, home appli-3 ance sensors, automotive sensors, and smartphone-equipped 4 sensors. These IoT data can be used for health monitoring [1], 5 context-aware recommendation (or recommender) systems 6 [2], navigation [3], and other applications.7 However, sensing people or their surrounding environment 8 might involve information that identifies an individual [4]. 9 Thus private information is at risk of leakage. By anonymiz-10 ing data based on ϵ-differential privacy [5], [6], which is the 11 de facto standard privacy metric (ϵ represents the privacy 12 budget), privacy leakage can be controlled. Differential pri-13 vacy has been used in many studies, such as [7]-[9], as it is 14 one of the most critical privacy metrics [10]. It is considered 15 an important concept for data analysis [11], [12]. 16 Local differential privacy is a specialized concept of dif-17 ferential privacy especially for data collection from each 18 person. In this paper, "differential privacy," refers to "lo-19