Combinatorial Algorithms for String Sanitization

Bernardini, Giulia; Chen, Huiping; Conte, Alessio; Grossi, Roberto; Loukides, Grigorios; Pisanti, Nadia; Pissis, Solon P.; Rosone, Giovanna; Sweering, Michelle

doi:10.1145/3418683

Cited by 6 publications

(15 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We used the following publicly available datasets: MSNBC (MSN), which contains page categories visited by users on msnbc.com over a 24-hour period; the complete genome of Escherichia coli (EC); KASANDR (KAS), which contains product categories in the Kelkoo price comparison site; and a dataset containing 27 Primate mitochondrial genomes (PR). MSN was used in the work of Gkoulalas-Divanis and Loukides [2011], Gwadera et al [2013], and Loukides and Gwadera [2015], EC was used in the work of Bernardini et al [2020a], was used in the work of Sidana et al [2017], and PR was used in the work of Thankachan et al [2017]. We also generated a uniformly random string of length 100M over an alphabet of size 10, and used its prefixes of length 1M, .…”

Section: Experimental Setup and Datasetsmentioning

confidence: 99%

“…To alleviate these concerns and comply with legislation such as HIPAA [U.S. Department of Health & Human Services 1996] in the United States and GDPR [European Parliament 2015] in the European Union, it is necessary to guarantee that using data structures does not lead to the reconstruction of the stored individuals' data. This is a fundamentally different privacy goal than that of existing privacy-preserving techniques, such as anonymization [Chen et al 2012a, b;Heatherly et al 2013;Xu et al 2016], sanitization [Bernardini et al 2019[Bernardini et al , 2020a[Bernardini et al , 2020c[Bernardini et al , 2020dBonomi et al 2016;Gkoulalas-Divanis and Loukides 2011;Gwadera et al 2013;Loukides and Gwadera 2015;Wang et al 2013], query auditing [Nabar et al 2008], or access control [Bertino et al 2011]. Anonymization aims at preventing the disclosure of individuals' identities and/or sensitive information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reverse-Safe Text Indexing

Bernardini

Chen

Fici

et al. 2021

ACM J. Exp. Algorithmics

Self Cite

View full text Add to dashboard Cite

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z -RSDS. The construction algorithm takes O(nɷ log d) time, where ɷ is the matrix multiplication exponent. We show that, despite the nɷ factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z -RSDS for decision pattern matching queries, whose size can be sublinear in n . A preliminary version of this article appeared in ALENEX 2020.

show abstract

Section: Experimental Setup and Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Reverse-Safe Text Indexing

Bernardini

Chen

Fici

et al. 2021

ACM J. Exp. Algorithmics

Self Cite

View full text Add to dashboard Cite

show abstract

“…A4. Preventing the detection of confidential communities by sanitizing a graph prior to its dissemination, in the spirit of sanitization works on transaction [21] or sequential data [4]. The edges identified in the output of the CB problem must be removed to hide these communities in the sanitized graph.…”

Section: Introductionmentioning

confidence: 99%

“…(b) The subgraph induced by all edges except the (dashed) edge (0, 4) is the maximal 4-truss of the graph. (c) The graph obtained after removing the set {(0, 1), (3,4), (5,6)} of (dashed) edges contains no 4-truss. (d) The graph obtained after removing the set {(3, 6), (5, 7)} of (dashed) edges is a graph in which the (gray) nodes 5 and 6 are not contained in any 4-truss.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Breaking Truss-Based Communities

Chen

Conte

Grossi

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

A -truss is a graph such that each edge is contained in at least − 2 triangles. This notion has attracted much attention, because it models meaningful cohesive subgraphs of a graph. We introduce the problem of identifying a smallest edge subset of a given graph whose removal makes the graph -truss-free. We also introduce a problem variant where the identified subset contains only edges incident to a given set of nodes and ensures that these nodes are not contained in any -truss. These problems are directly applicable in communication networks: the identified edges correspond to vital network connections; or in social networks: the identified edges can be hidden by users or sanitized from the output graph. We show that these problems are NP-hard. We thus develop exact exponentialtime algorithms to solve them. To process large networks, we also develop heuristics sped up by an efficient data structure for updating the truss decomposition under edge deletions. We complement our heuristics with a lower bound on the size of an optimal solution to rigorously evaluate their effectiveness. Extensive experiments on 10 real-world graphs show that our heuristics are effective (close to the optimal or to the lower bound) and also efficient (up to two orders of magnitude faster than a natural baseline).

show abstract

String Editing Under Pattern Constraints

Barish¹,

Shibuya²

2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

Combinatorial Algorithms for String Sanitization

Cited by 6 publications

References 22 publications

Reverse-Safe Text Indexing

Reverse-Safe Text Indexing

On Breaking Truss-Based Communities

String Editing Under Pattern Constraints

Contact Info

Product

Resources

About