Vitor Hugo Galhardo Moia scite author profile

Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. There are several ways to perform this search. Known File Filter (KFF) is one of them, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device. However, due to limitations over hash functions (inability to detect similar objects), new methods have been designed, called approximate matching. This sort of function has interesting characteristics for KFF investigations but suffers mainly from high costs when dealing with huge data sets, as the search is usually done by brute force. To mitigate this problem, strategies have been developed to better perform lookups. In this paper, we present the state of the art of similarity digest search strategies, along with a detailed comparison involving several aspects, as time complexity, memory requirement, and search precision. Our results show that none of the approaches address at least these main aspects. Finally, we discuss future directions and present requirements for a new strategy aiming to fulfill current limitations.

show abstract

Understanding uses and misuses of similarity hashing functions for malware detection and family clustering in actual scenarios

Botacin

Moia

Ceschin

et al. 2021

Forensic Science International: Digital Investigation

View full text Add to dashboard Cite

A comparative analysis about similarity search strategies for digital forensics investigations

Moia¹,

Henriques²

2017

View full text Add to dashboard Cite

Known File Filtering method separates relevant from non-relevant information in forensics investigations using white or black lists. Due to limitations on hash functions (inability to detect similar data), approximate matching tools have gained focus recently. However, comparing two sets of approximate matching digests using brute force can be too time-consuming. Strategies to efficiently perform lookups in digests databases have been proposed as a form of similarity search. In this paper, we compare some strategies based on ssdeep and sdhash tools concerning to precision, memory requirement, and lookup complexity. We show that none of these strategies address these requirements satisfactorily.

show abstract

Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations

Moia¹,

Breitinger²,

Henriques³

2019

View full text Add to dashboard Cite

Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different ﬁles regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be ﬁltered out and that a new interpretation of the score allows a better similarity detection.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.