Automated U.S diplomatic cables security classification: Topic model pruning vs. classification based on clusters

Alzhrani, Khudran; Rudd, Ethan M.; Chow, Chee-Onn; Boult, Terrance E.

doi:10.1109/ths.2017.7943471

Cited by 6 publications

(1 citation statement)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar studies later reproduced and expanded upon these results [6], [15], [1] by using the complete document contents, multiple securityclassification levels and per-paragraph sensitivity predictions, while the concept of controlled environments was introduced by us in [8].…”

Section: Related Workmentioning

confidence: 71%

Data Leakage Prevention for Secure Cross-Domain Information Exchange

et al. 2017

View full text Add to dashboard Cite

Abstract-Cross-domain information exchange is an increasingly important capability for conducting efficient and secure operations, both within coalitions and within single nations. A data guard is a common cross-domain sharing solution that inspects and validates that the security labels of exported data objects are such that they can be released according to policy. While we see that guard solutions can be implemented with high assurance, we find that obtaining an equivalent level of assurance in the correctness of the security labels easily becomes a hard problem in practical scenarios. Thus, a weakness of the guardbased solution is that there is often limited assurance in the correctness of the security labels. To mitigate this, guards make use of content checkers such as dirty word lists as a means for detecting mislabeled data.To improve the overall security of such cross-domain solutions we investigate more advanced content checkers based on the use of machine learning. Instead of relying on manually specified dirty word lists, we can build data-driven methods that automatically infer the words associated with classified content. However, care must be taken when constructing and deploying these methods as naive implementations are vulnerable to manipulation attacks. In order to provide a better context for performing classification, we monitor the incoming information flow and use the audit trail to construct controlled environments. The usefulness of said deployment scheme is demonstrated using a real collection of classified and unclassified documents.

show abstract