Proceedings of the 5th ACM Conference on Data and Application Security and Privacy 2015
DOI: 10.1145/2699026.2699106
|View full text |Cite
|
Sign up to set email alerts
|

Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduce

Abstract: The exposure of sensitive data in storage and transmission poses a serious threat to organizational and personal security. Data leak detection aims at scanning content (in storage or transmission) for exposed sensitive data. Because of the large content and data volume, such a screening algorithm needs to be scalable for a timely detection. Our solution uses the MapReduce framework for detecting exposed sensitive content, because it has the ability to arbitrarily scale and utilize public resources for the task… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 44 publications
(19 citation statements)
references
References 38 publications
0
19
0
Order By: Relevance
“…Machine learning algorithms are also used in context‐based DLPD approaches . In the era of big data, the most severe problem of content‐based DLPD approaches is scalability, i.e., they are not able to process massive content data in time …”
Section: Dlpd Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…Machine learning algorithms are also used in context‐based DLPD approaches . In the era of big data, the most severe problem of content‐based DLPD approaches is scalability, i.e., they are not able to process massive content data in time …”
Section: Dlpd Techniquesmentioning
confidence: 99%
“…To address the above challenges, we now introduce a privacy preserving data leak detection system as a case study, named MapReduce‐based Data Leak Detection (MR‐DLD) . It utilizes the MapReduce distributed computing framework to inspect sensitive content for inadvertent data leak detection, and can be deployed either in local computer clusters or in the Cloud.…”
Section: Dlpd In the Big Data Eramentioning
confidence: 99%
“…The ElGamal cryptographic approach has been used in distributed fashion where semi-honest server mine the encrypted data. Liu et al [2] proposed privacy preserved scanning of big data using MapReduce framework. The technique minimizes the sensitive data exposure during the data detection for outsourcing data securely.…”
Section: Related Workmentioning
confidence: 99%
“…These voluminous datasets facilitate the analysis and understanding of much needed global trends and interesting patterns, for which organizations/clients may require to share their data with others. Sharing may cause exposure of sensitive information present in these datasets and might invite number of privacy threats [2] (e.g. medical records or financial records if mined can provide significant human benefits but the failure of privacy might allow malicious users or providers to misuse this information which can cause considerable economic or social loss).…”
Section: Introductionmentioning
confidence: 99%
“…The two assumptions can be removed when our 2 Such channels are widely used for advanced NIDS where MITM (man-in-the-middle) SSL sessions are employed to handle encryption. alignment is performed utilizing secure multi-party computation or other privacy-preserving techniques [2,4]. We do not aim at detecting stealthy data leaks that an attacker encrypt the sensitive data by herself before leaking it.…”
Section: Models and Overviewmentioning
confidence: 99%