2016 IEEE Conference on Intelligence and Security Informatics (ISI) 2016
DOI: 10.1109/isi.2016.7745451
|View full text |Cite
|
Sign up to set email alerts
|

Automated big text security classification

Abstract: Abstract-In recent years, traditional cybersecurity safeguards have proven ineffective against insider threats. Famous cases of sensitive information leaks caused by insiders, including the WikiLeaks release of diplomatic cables and the Edward Snowden incident, have greatly harmed the U.S. government's relationship with other governments and with its own citizens. Data Leak Prevention (DLP) is a solution for detecting and preventing information leaks from within an organization's network. However, state-of-art… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 12 publications
0
9
0
Order By: Relevance
“…The majority of the literature on sensitivity labelling is associated with Data Loss Prevention (DLP) systems [8][9][10][11][12], notably focusing on classifying sensitivity at the document level. Other research uses sensitivity classification for confidential information redaction on declassified documents [13][14][15], where classification is often performed at finer granularity that reaches the token level.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The majority of the literature on sensitivity labelling is associated with Data Loss Prevention (DLP) systems [8][9][10][11][12], notably focusing on classifying sensitivity at the document level. Other research uses sensitivity classification for confidential information redaction on declassified documents [13][14][15], where classification is often performed at finer granularity that reaches the token level.…”
Section: Related Workmentioning
confidence: 99%
“…Later, MacDonald et al built a novel SVM sensitivity classifier by mixing concepts from both NLP and machine learning for government document declassification, using POS n-gram tags as a sensitivity load indicator [15]. Alzhrani et al [9] proposed another DLP system with more fine-grained granularity. Their work effectively combined unsupervised and supervised methods to create a similarity-based classifier operating on a paragraph level and trained on an ad-hoc annotated WikiLeaks corpus.…”
Section: Related Workmentioning
confidence: 99%
“…The data collection must also adhere to the conditions as in Table 8. Table 8 shows that the data to be used in this study must be reported incidents, endorsed by authorized organization which having their own of third party security auditor, and should be open to public as it indicates a collective and ability to produce a predictive trend [39]. In market, the vulnerability database can be from local authorities or international authorities.…”
Section: Data Collectionmentioning
confidence: 99%
“…In the ACESS paper [1], we generated a wide range of clusters and built a classification model based on each of these clusters. The intuition behind this was to find the group of classification models with the best results via a crossvalidation technique.…”
Section: B Automated Security Classification Enabled By Similarity (mentioning
confidence: 99%
“…Unfortunately, real sensitive text datasets are generally kept private, so these works were not able to perform realistic evaluations, instead using fictitious "sensitive" texts (e.g., collected from public Twitter feeds). To perform a more realistic evaluation of the problem, we introduced the first public sensitive text dataset in [1], consisting of diplomatic cables made public by the WikiLeaks organization, and performed evaluations using a novel system design -Automated Classification Enabled by Security Similarity (ACESS) -which performs local classifier learning over clusters of similarity groups. This work, while seminal, had the disadvantage that the protocol was inherently expensive to tune hyperparameters over and had no separate validation set.…”
Section: Related Workmentioning
confidence: 99%