2010 IEEE Second International Conference on Social Computing 2010
DOI: 10.1109/socialcom.2010.167
|View full text |Cite
|
Sign up to set email alerts
|

Data Leak Prevention through Named Entity Recognition

Abstract: The rise of the social web has brought a series of privacy concerns and threats. In particular, data leakage is a risk that affects the privacy of not only companies but individuals. Although there are tools that can prevent data losses, they require a prior step that involves the sensitive data to be properly identified. In this paper, we propose a new automatic approach that applies Named Entity Recognition (NER) to prevent data leaks. We conduct an empirical study with realworld data and show that this NER-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2010
2010
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(18 citation statements)
references
References 15 publications
0
18
0
Order By: Relevance
“…Their offered solution can be based on classifiers they constructed throughout their study which can identify tweets containing private information, such as vacation plans. Moreover, Gómez-Hidalgo et al [59] used Named Entity Recognition (NER) algorithms to prevent data leakage. In their study, they implemented a prototype to demonstrate how their methods can prevent data leakage.…”
Section: Academic Solutionsmentioning
confidence: 99%
“…Their offered solution can be based on classifiers they constructed throughout their study which can identify tweets containing private information, such as vacation plans. Moreover, Gómez-Hidalgo et al [59] used Named Entity Recognition (NER) algorithms to prevent data leakage. In their study, they implemented a prototype to demonstrate how their methods can prevent data leakage.…”
Section: Academic Solutionsmentioning
confidence: 99%
“…We are not the first to explore applying machine learning toward sensitive text classification. However, previous authors, e.g., [6], [7], [8], have discussed the problem and proposed solutions for approaching it. Unfortunately, real sensitive text datasets are generally kept private, so these works were not able to perform realistic evaluations, instead using fictitious "sensitive" texts (e.g., collected from public Twitter feeds).…”
Section: Related Workmentioning
confidence: 99%
“…Hart et al [9] proposed a new training method to overcome the problem of imbalanced data by implementing class-specific classifiers. Gomez-Hidalgoy et al [10] proposed the usage of named entity recognition to detect "sensitive" tweets.…”
Section: Related Workmentioning
confidence: 99%