Proceedings of the 24th ACM International on Conference on Information and Knowledge Management 2015
DOI: 10.1145/2806416.2806597
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Automated Text Classification for Sensitivity Identification

Abstract: Sensitive documents are those that cannot be made public, e.g., for personal or organizational privacy reasons. For instance, documents requested through Freedom of Information mechanisms must be manually reviewed for the presence of sensitive information before their actual release. Hence, tools that can assist human reviewers in spotting sensitive information are of great value to government organizations subject to Freedom of Information laws. We look at sensitivity identification in terms of semi-automated… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 17 publications
(36 citation statements)
references
References 9 publications
0
36
0
Order By: Relevance
“…As our classifier, we use SVM with a linear kernel and C = 1.0, since this theoretically motivated, default, parameter setting has been shown to provide the best effectiveness for text classification [2,24]. We select F 2 as our main metric since sensitivity classification is a recall oriented task [3,4], where the consequences of missclassifying a sensitive document are much greater than miss-classifying a not-sensitive document. We also report the standard F-Measure (F 1 ) and, to account for class imbalance, we report Balanced Accuracy (BAC), where 0.5 BAC is random.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…As our classifier, we use SVM with a linear kernel and C = 1.0, since this theoretically motivated, default, parameter setting has been shown to provide the best effectiveness for text classification [2,24]. We select F 2 as our main metric since sensitivity classification is a recall oriented task [3,4], where the consequences of missclassifying a sensitive document are much greater than miss-classifying a not-sensitive document. We also report the standard F-Measure (F 1 ) and, to account for class imbalance, we report Balanced Accuracy (BAC), where 0.5 BAC is random.…”
Section: Methodsmentioning
confidence: 99%
“…Berardi et al [4] built on the work of McDonald et al [3] to optimise the costeffectiveness of sensitivity reviewers. In that work, Berardi et al deployed a utilitytheoretic ranking approach for semi-automatic text classification [8].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations