A semantic-preserving differentially private method for releasing query logs

Sánchez, David; Batet, Montserrat; Viejo, Alexandre; Rodríguez-García, Mercedes; Castellà‐Roca, Jordi

doi:10.1016/j.ins.2018.05.046

Cited by 6 publications

(5 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this approach, the idea was to preserve the semantics of the query logs through the differential privacy-based method by preserving the cardinality and granularity as the original query logs. Differential privacy is a better method than the k-anonymity method to protect privacy [45]. However, in this paper, we are improving the MDAV approach and, in the future, we will study the differential privacy aspect in the same model.…”

Section: Related Workmentioning

confidence: 99%

“…Moreover, several algorithms are proposed that microaggregate a complete record or a specific group of attributes within a record to preserve privacy. Sánchez et al [45] proposed another differential privacy-based method to preserve the privacy of the query logs. In this approach, the idea was to preserve the semantics of the query logs through the differential privacy-based method by preserving the cardinality and granularity as the original query logs.…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, such data can be exploited for unfair purposes, hence, it raises disclosure or privacy risks for the individuals [44]. Usually, the datasets that hold PII comprise of the following type of key variables [45,48]: (i) identifying variables (value of such variables identifies a person e.g., name, social security number, age, date of birth, etc. ), and (ii) sensitive variables (as defined by the legislation [15,19], values of such variables may cause legal implications e.g., political views, religious views, diseases, etc).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multivariate Microaggregation of Set-Valued Data

Daud

Shaheen

Ahmed

2022

ITC

View full text Add to dashboard Cite

Data controllers manage immense data, and occasionally, it is released publically to help the researchers toconduct their studies. However, this publically shared data may hold personally identifiable information (PII)that can be collected to re-identify a person. Therefore, an effective anonymization mechanism is required toanonymize such data before it is released publically. Microaggregation is one of the Statistical Disclosure Control (SDC) methods that are widely used by many researchers. This method adapts the k-anonymity principle togenerate k-indistinguishable records in the same clusters to preserve the privacy of the individuals. However,in these methods, the size of the clusters is fixed (i.e., k records), and the clusters generated through these methods may hold non-homogeneous records. By considering these issues, we propose an adaptive size clusteringtechnique that aggregates homogeneous records in similar clusters, and the size of the clusters is determinedafter the semantic analysis of the records. To achieve this, we extend the MDAV microaggregation algorithm tosemantically analyze the unstructured records by relying on the taxonomic databases (i.e., WordNet), and thenaggregating them in homogeneous clusters. Furthermore, we propose a distance measure that determines theextent to which the records differ from each other, and based on this, homogeneous adaptive clusters are constructed. In experiments, we measured the cohesiveness of the clusters in order to gauge the homogeneity ofrecords. In addition, a method is proposed to measure information loss caused by the redaction method. In experiments, the results show that the proposed mechanism outperforms the existing state-of-the-art solutions.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multivariate Microaggregation of Set-Valued Data

Daud

Shaheen

Ahmed

2022

ITC

View full text Add to dashboard Cite

show abstract

“…The differential privacy approach can also be applied to anonymize data-streams [66]. In this case, there is no release of the original query, but a synthetic one, obtained using semantic similarity.…”

Section: B: Differential Privacymentioning

confidence: 99%

A Real-Time Query Log Protection Method for Web Search Engines

2020

View full text Add to dashboard Cite

Web search engines (e.g., Google, Bing, Qwant, and DuckDuckGo) may process a myriad of search queries per second. According to Internet Live Stats, Google handles more than two hundred million queries per hour, i.e., about two trillion queries per year. For monetization purposes, the queries can be stored and complemented with additional data, referred to as query logs. Together, they can correlate valuable information to build accurate user profiles. Before releasing the query logs to third parties (e.g., for profit purposes), the personal information contained in the query logs must be properly protected by the web search engines. Current regulations establish strict control, and require from provable anonymization processing (e.g., in terms of statistical disclosure) of any personally identifiable information. In this paper, we tackle this challenge. We propose a real-time anonymization solution to protect streams of unstructured data at the server side. Our approach is based on the use of a probabilistic k-anonymity technique. It allows probabilistic processing of personally identifiable attributes contained in the query logs, with provable privacy properties. Our solution handles limitations of traditional k-anonymity approaches with respect to unstructured data and real-time processing. We present the implementation of our solution and report experimental evaluation results. The evaluation is conducted in terms of privacy, utility, and scalability achievement. Results validate the feasibility of our proposal.

show abstract

“…This framework was able to achieve a good balance between retrieval utility and privacy; however, the framework was only empirically evaluated against multiple search algorithms on their retrieval utility. Sánchez et al [53] proposed a privacy-preserving method of query logs that joined the flexibility and convenience of privacy-preserving data releasing with the strong privacy guarantees of differential privacy. Although this method produced query logs with differential privacy that was useful for data analysis, the major challenge was that the microaggregation algorithm did adapt to the lack of structure of query logs.…”

Section: Related Workmentioning

confidence: 99%

Privacy-Preserving Monotonicity of Differential Privacy Mechanisms

Liu

Zhou

et al. 2018

Applied Sciences

View full text Add to dashboard Cite

Differential privacy mechanisms can offer a trade-off between privacy and utility by using privacy metrics and utility metrics. The trade-off of differential privacy shows that one thing increases and another decreases in terms of privacy metrics and utility metrics. However, there is no unified trade-off measurement of differential privacy mechanisms. To this end, we proposed the definition of privacy-preserving monotonicity of differential privacy, which measured the trade-off between privacy and utility. First, to formulate the trade-off, we presented the definition of privacy-preserving monotonicity based on computational indistinguishability. Second, building on privacy metrics of the expected estimation error and entropy, we theoretically and numerically showed privacy-preserving monotonicity of Laplace mechanism, Gaussian mechanism, exponential mechanism, and randomized response mechanism. In addition, we also theoretically and numerically analyzed the utility monotonicity of these several differential privacy mechanisms based on utility metrics of modulus of characteristic function and variant of normalized entropy. Third, according to the privacy-preserving monotonicity of differential privacy, we presented a method to seek trade-off under a semi-honest model and analyzed a unilateral trade-off under a rational model. Therefore, privacy-preserving monotonicity can be used as a criterion to evaluate the trade-off between privacy and utility in differential privacy mechanisms under the semi-honest model. However, privacy-preserving monotonicity results in a unilateral trade-off of the rational model, which can lead to severe consequences.

show abstract

A semantic-preserving differentially private method for releasing query logs

Cited by 6 publications

References 44 publications

Multivariate Microaggregation of Set-Valued Data

Multivariate Microaggregation of Set-Valued Data

A Real-Time Query Log Protection Method for Web Search Engines

Privacy-Preserving Monotonicity of Differential Privacy Mechanisms

Contact Info

Product

Resources

About