Government documents must be manually reviewed to identify any
sensitive
information, e.g., confidential information, before being publicly archived. However, human-only sensitivity review is not practical for born-digital documents due to, for example, the volume of documents that are to be reviewed. In this work, we conduct a user study to evaluate the effectiveness of sensitivity classification for
assisting
human sensitivity reviewers. We evaluate how the accuracy and confidence levels of sensitivity classification affects the number of documents that are correctly judged as being sensitive (reviewer accuracy) and the time that it takes to sensitivity review a document (reviewing speed). In our within-subject study, the participants review government documents to identify real sensitivities while being assisted by three sensitivity classification
treatments
, namely
None
(no classification predictions),
Medium
(sensitivity predictions from a
simulated
classifier with a balanced accuracy (BAC) of 0.7), and
Perfect
(sensitivity predictions from a classifier with an accuracy of 1.0). Our results show that sensitivity classification leads to significant improvements (ANOVA,
p
< 0.05) in reviewer accuracy in terms of BAC (+37.9%
Medium
, +60.0%
Perfect
) and also in terms of F
2
(+40.8%
Medium
, +44.9%
Perfect
). Moreover, we show that assisting reviewers with sensitivity classification predictions leads to significantly increased (ANOVA,
p
< 0.05) mean reviewing speeds (+72.2%
Medium
, +61.6%
Perfect
). We find that reviewers do not agree with the classifier significantly more as the classifier’s confidence increases. However, reviewing speed is significantly increased when the reviewers agree with the classifier (ANOVA,
p
< 0.05). Our in-depth analysis shows that when the reviewers are not assisted with sensitivity predictions, mean reviewing speeds are 40.5% slower for sensitive judgements compared to not-sensitive judgements. However, when the reviewers
are
assisted with sensitivity predictions, the difference in reviewing speeds between sensitive and not-sensitive judgements is reduced by ˜10%, from 40.5% to 30.8%. We also find that, for sensitive judgements, sensitivity classification predictions significantly increase mean reviewing speeds by 37.7% when the reviewers agree with the classifier’s predictions (
t
-test,
p
< 0.05). Overall, our findings demonstrate that sensitivity classification is a viable technology for assisting human reviewers with the sensitivity review of digital documents.