2017
DOI: 10.1109/tkde.2016.2628180
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Iterative Classification for Sanitizing Large-Scale Datasets

Abstract: Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains. Many organizations aim to share such data while obscuring features that could disclose personally identifiable information. Much of this data exhibits weak structure (e.g., text), such that machine learning approaches have been developed to detect and remove identifiers from it. While learning is never perfect, and relying on such approaches to sanitize data can leak sensitive information, a smal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 52 publications
0
4
0
Order By: Relevance
“…Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains [80]. In cloud platforms, most of the time, BD can not be reached because of the privacy and security concerns and effective tools are being deployed to detect and respond faster to cyberthreats, attacks, breaches of data.…”
Section: Discussionmentioning
confidence: 99%
“…Cheap ubiquitous computing enables the collection of massive amounts of personal data in a wide variety of domains [80]. In cloud platforms, most of the time, BD can not be reached because of the privacy and security concerns and effective tools are being deployed to detect and respond faster to cyberthreats, attacks, breaches of data.…”
Section: Discussionmentioning
confidence: 99%
“…Similarly, Jiang et al [38] have developed the t-plausibility algorithm to replace the known (labeled) sensitive identifiers within the documents and guarantee that the sanitized document is associated with at least t documents. Li et al [42] have proposed a game theoretic framework for automatic redacting sensitive information. In general, finding and redacting sensitive information with high accuracy is still challenging.…”
Section: Differentially Private Learning Methodsmentioning
confidence: 99%
“…Despite its novel advantage in condensing the information of the entire dataset in a smaller dataset, dataset distillation is essentially a DNN model (see Section II). Previous studies [42], [47] have shown that DNN models (e.g., image classifiers, language models) are vulnerable to security and privacy attacks, such as adversarial attacks [21], [35], [55], inference attacks [18], [26], [54], [59], [61], [66], backdoor attacks [24], [60], [76], [86]. Yet, existing dataset distillation efforts [5], [52], [53], [89] mainly focus on designing new algorithms to distill a large dataset better.…”
Section: Introductionmentioning
confidence: 99%