2016
DOI: 10.4108/eai.8-12-2016.151725
|View full text |Cite
|
Sign up to set email alerts
|

Identifying forensically uninteresting files in a large corpus

Abstract: For digital forensics, eliminating the uninteresting is often more critical than finding the interesting. We discuss methods exploiting the metadata of a large corpus. Tests were done with an international corpus of 262.7 million files obtained from 4018 drives. For malware investigations, we show that using a Bayesian ranking formula on metadata can increase malware recall by 5.1 while increasing precision by 1.7 times over inspecting executables alone. For more general investigations, we show that requiring … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 14 publications
(22 reference statements)
0
1
0
Order By: Relevance
“…For each file in the NSRL collection, the RDS includes 1) cryptographic hash values of the file's content, 2) information about the software package(s) containing the file, 3) the manufacturer of the package, 4) the original name, and 5) the size of the file. Many studies have used the hash list of RDS to identify and filter known benign files [3], [11], [30], and [31].…”
Section: A Index Searching and Filteringmentioning
confidence: 99%
“…For each file in the NSRL collection, the RDS includes 1) cryptographic hash values of the file's content, 2) information about the software package(s) containing the file, 3) the manufacturer of the package, 4) the original name, and 5) the size of the file. Many studies have used the hash list of RDS to identify and filter known benign files [3], [11], [30], and [31].…”
Section: A Index Searching and Filteringmentioning
confidence: 99%