2020
DOI: 10.1145/3418683
|View full text |Cite
|
Sign up to set email alerts
|

Combinatorial Algorithms for String Sanitization

Abstract: String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user’s location history). In this article, we consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility, in two settings that are relevant to many common strin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

4
2

Authors

Journals

citations
Cited by 6 publications
(15 citation statements)
references
References 22 publications
0
15
0
Order By: Relevance
“…We used the following publicly available datasets: MSNBC (MSN), which contains page categories visited by users on msnbc.com over a 24-hour period; the complete genome of Escherichia coli (EC); KASANDR (KAS), which contains product categories in the Kelkoo price comparison site; and a dataset containing 27 Primate mitochondrial genomes (PR). MSN was used in the work of Gkoulalas-Divanis and Loukides [2011], Gwadera et al [2013], and Loukides and Gwadera [2015], EC was used in the work of Bernardini et al [2020a], was used in the work of Sidana et al [2017], and PR was used in the work of Thankachan et al [2017]. We also generated a uniformly random string of length 100M over an alphabet of size 10, and used its prefixes of length 1M, .…”
Section: Experimental Setup and Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…We used the following publicly available datasets: MSNBC (MSN), which contains page categories visited by users on msnbc.com over a 24-hour period; the complete genome of Escherichia coli (EC); KASANDR (KAS), which contains product categories in the Kelkoo price comparison site; and a dataset containing 27 Primate mitochondrial genomes (PR). MSN was used in the work of Gkoulalas-Divanis and Loukides [2011], Gwadera et al [2013], and Loukides and Gwadera [2015], EC was used in the work of Bernardini et al [2020a], was used in the work of Sidana et al [2017], and PR was used in the work of Thankachan et al [2017]. We also generated a uniformly random string of length 100M over an alphabet of size 10, and used its prefixes of length 1M, .…”
Section: Experimental Setup and Datasetsmentioning
confidence: 99%
“…To alleviate these concerns and comply with legislation such as HIPAA [U.S. Department of Health & Human Services 1996] in the United States and GDPR [European Parliament 2015] in the European Union, it is necessary to guarantee that using data structures does not lead to the reconstruction of the stored individuals' data. This is a fundamentally different privacy goal than that of existing privacy-preserving techniques, such as anonymization [Chen et al 2012a, b;Heatherly et al 2013;Xu et al 2016], sanitization [Bernardini et al 2019[Bernardini et al , 2020a[Bernardini et al , 2020c[Bernardini et al , 2020dBonomi et al 2016;Gkoulalas-Divanis and Loukides 2011;Gwadera et al 2013;Loukides and Gwadera 2015;Wang et al 2013], query auditing [Nabar et al 2008], or access control [Bertino et al 2011]. Anonymization aims at preventing the disclosure of individuals' identities and/or sensitive information.…”
Section: Introductionmentioning
confidence: 99%
“…A4. Preventing the detection of confidential communities by sanitizing a graph prior to its dissemination, in the spirit of sanitization works on transaction [21] or sequential data [4]. The edges identified in the output of the CB problem must be removed to hide these communities in the sanitized graph.…”
Section: Introductionmentioning
confidence: 99%
“…(b) The subgraph induced by all edges except the (dashed) edge (0, 4) is the maximal 4-truss of the graph. (c) The graph obtained after removing the set {(0, 1), (3,4), (5,6)} of (dashed) edges contains no 4-truss. (d) The graph obtained after removing the set {(3, 6), (5, 7)} of (dashed) edges is a graph in which the (gray) nodes 5 and 6 are not contained in any 4-truss.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation