An Efficient Similarity Digests Database Lookup – A Logarithmic Divide &amp; Conquer Approach

Breitinger, Frank; Rathgeb, Christian; Baier, Harald

doi:10.15394/jdfsl.2014.1178

Cited by 9 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a form to mitigate the limitation of MRSH-NET [31], which only answers whether an object is present in the filter or not but does not identify the object, Breitinger et al [32] propose a new similarity digest search strategy based on the wellknown divide and conquer paradigm. In this approach, the authors build a Bloom filter-based tree data structure to store digests and efficiently locate similar objects.…”

Section: Bloommentioning

confidence: 99%

“…However, MRSH-NET, BF-based tree, and MRSH-CF approaches do, according to the explanation presented below. According to Breitinger et al [32], the false positive probability for an object is calculated by = , where is the false positive probability for a single feature and the number of following features required to be found in the filter to be considered a match. While can be adjusted according to the desired false positive rate, is defined by…”

Section: Strategy's Match Decisionmentioning

confidence: 99%

“…, where is the number of independent hash functions, the number of features inserted into the Bloom filter, and the filter size [32]. This way, the strategy decides when a match is found by a number of features present in the filter, and based on it, it can create more or less false positives depending on the parameters used by the strategy.…”

Section: Strategy's Match Decisionmentioning

confidence: 99%

“…This way, when calculating the filter size and getting a result of 2 31.27 bits, for instance, we need to adjust their size for 2 32 . Although this modification can almost double the theoretical size of the filter in some cases, it is necessary for practical implementations.…”

Section: Memory Requirementsmentioning

confidence: 99%

See 3 more Smart Citations

Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

Moia

Henriques

2017

Security and Communication Networks

View full text Add to dashboard Cite

Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. There are several ways to perform this search. Known File Filter (KFF) is one of them, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device. However, due to limitations over hash functions (inability to detect similar objects), new methods have been designed, called approximate matching. This sort of function has interesting characteristics for KFF investigations but suffers mainly from high costs when dealing with huge data sets, as the search is usually done by brute force. To mitigate this problem, strategies have been developed to better perform lookups. In this paper, we present the state of the art of similarity digest search strategies, along with a detailed comparison involving several aspects, as time complexity, memory requirement, and search precision. Our results show that none of the approaches address at least these main aspects. Finally, we discuss future directions and present requirements for a new strategy aiming to fulfill current limitations.

show abstract

Section: Bloommentioning

confidence: 99%

Section: Strategy's Match Decisionmentioning

confidence: 99%

Section: Strategy's Match Decisionmentioning

confidence: 99%

Section: Memory Requirementsmentioning

confidence: 99%

See 2 more Smart Citations

Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

Moia

Henriques

2017

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…To mitigate the limitation of MRSH-NET in answering only membership queries, Breitinger, F. et al [18] proposed a new approach for performing similarity search. This strategy is based on the well-known divide and conquer paradigm.…”

Section: Bloom Filter-based Tree Strategymentioning

confidence: 99%

A comparative analysis about similarity search strategies for digital forensics investigations

Moia¹,

Henriques²

2017

Anais De XXXV Simpósio Brasileiro De Telecomunicações E Processamento De Sinais

View full text Add to dashboard Cite

Known File Filtering method separates relevant from non-relevant information in forensics investigations using white or black lists. Due to limitations on hash functions (inability to detect similar data), approximate matching tools have gained focus recently. However, comparing two sets of approximate matching digests using brute force can be too time-consuming. Strategies to efficiently perform lookups in digests databases have been proposed as a form of similarity search. In this paper, we compare some strategies based on ssdeep and sdhash tools concerning to precision, memory requirement, and lookup complexity. We show that none of these strategies address these requirements satisfactorily.

show abstract

Creating a Map of User Data in NTFS to Improve File Carving

Karresand

Warnqvist

Lindahl

et al. 2019

Advances in Digital Forensics XV

View full text Add to dashboard Cite

Digital forensics and, especially, file carving are burdened by the large amounts of data that need to be processed. Attempts to solve this problem include efficient carving algorithms, parallel processing in the cloud and data reduction by filtering uninteresting files. This research addresses the problem by searching for data where it is more likely to be found. This is accomplished by creating a probability map for finding unique data at various logical block addressing positions in storage media. SHA-1 hashes of 512 B sectors are used to represent the data. The results, which are based on a collection of 30 NTFS partitions from computers running Microsoft Windows 7 and later versions, reveal that the mean probability of finding unique hash values at different logical block addressing positions vary between 12% to 41% in an NTFS partition. The probability map can be used by a forensic analyst to prioritize relevant areas in storage media without the need for a working filesystem. It can also be used to increase the efficiency of hash-based carving by dynamically changing the random sampling frequency. The approach contributes to digital forensic processes by enabling them to focus on interesting regions in storage media, increasing the probability of obtaining relevant results faster.

show abstract

An Efficient Similarity Digests Database Lookup – A Logarithmic Divide & Conquer Approach

Cited by 9 publications

References 16 publications

Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

A comparative analysis about similarity search strategies for digital forensics investigations

Creating a Map of User Data in NTFS to Improve File Carving

Contact Info

Product

Resources

About