2017
DOI: 10.1007/978-3-319-56608-5_35
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Sensitivity Classification with Semantic Features Using Word Embeddings

Abstract: Abstract. Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. However, sensitivity is typically a product of the relations between combinations of terms, such as who said what… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
29
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 26 publications
(32 citation statements)
references
References 19 publications
(44 reference statements)
3
29
0
Order By: Relevance
“…Feature engineering for sensitivity classification was subsequently investigated further by McDonald et al [5]. In that work, the authors constructed document representations using word embeddings to capture semantic relations in the documents, such as who said what about whom.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Feature engineering for sensitivity classification was subsequently investigated further by McDonald et al [5]. In that work, the authors constructed document representations using word embeddings to capture semantic relations in the documents, such as who said what about whom.…”
Section: Related Workmentioning
confidence: 99%
“…However, in the era of born-digital documents such as e-mail, this purely manual review is not feasible [1], for example due to the volume of digital documents that are to be reviewed. Recently, automatic sensitivity classification algorithms have been shown to have potential for effectively identifying sensitive information in documents [2][3][4][5]. However, the potential consequences from the inadvertent release of sensitive information can be severe, for example if the identity of an informant is made public it can put the informant and their family at risk.…”
Section: Introductionmentioning
confidence: 99%
“…Assisting the sensitivity review of digital government documents has received some attention in the literature in recent years [4][5][6][7][8][9]. Most of that work has focused on developing classification algorithms for identifying sensitivity, either at the document level [5,7] or sensitive text within documents [9].…”
Section: Related Workmentioning
confidence: 99%
“…Most of that work has focused on developing classification algorithms for identifying sensitivity, either at the document level [5,7] or sensitive text within documents [9]. Berardi et al [8] investigated improving the cost-effectiveness of sensitivity reviewers by deploying a utility-theoretic [10] semi-automatic text classification approach to identify a ranking strategy that can maximise the overall classification effectiveness when a reviewer corrects a portion of misclassified documents, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…During the poster session, Graham presented his evaluation of the effectiveness of semantic word embedding features, along with term and grammatical features, for sensitivity classification. On a test collection of government documents containing real sensitivities, he showed that extending text classification with semantic features and additional term n-grams results in significant improvements in classification effectiveness, correctly classifying 9.99% more sensitive documents compared to the text classification baseline [20][21][22].…”
Section: An Initial Investigation Into Fixed and Adaptive Stopping Stmentioning
confidence: 99%