Abstract-This paper presents a novel solution to the problem of determining the ownership of carved information found on disk drives and other storage media that have been used by more than one person. When a computer is subject to forensic examination, information may be found that cannot be readily ascribed to a specific user. Such information is typically not located in a specific file or directory, but is found through file carving, which recovers data from unallocated disk sectors. Because the data is carved, it does not have associated file system metadata, and its owner cannot be readily ascertained. The technique presented in this paper starts by automatically recovering both file system metadata as well as extended metadata embedded in files (for instance, embedded timestamps) directly from a disk image. This metadata is then used to find exemplars and to create a machine learning classifier that can be used to ascertain the likely owner of the carved data. The resulting classifier is well suited for use in a legal setting since the accuracy can be easily verified using cross-validation. Our technique also results in a classifier that is easily validated by manual inspection. We report results of the technique applied to both specific hard drive data created in our laboratory and multiuser drives that we acquired on the secondary market. We also present a tool set that automatically creates the classifier and performs validation.
Index partitioning techniques-where indexes are broken into multiple distinct sub-indexes-are a proven way to improve metadata search speeds and scalability for large file systems, permitting early triage of the file system. A partitioned metadata index can rule out irrelevant files and quickly focus on files that are more likely to match the search criteria. Also, in a large file system that contains many users, a user's search should not include confidential files the user doesn't have permission to view. To meet these two parallel goals, we propose a new partitioning algorithm, Security Aware Partitioning, that integrates security with the partitioning method to enable efficient and secure file system search.In order to evaluate our claim of improved efficiency, we compare the results of Security Aware Partitioning to six other partitioning methods, including implementations of the metadata partitioning algorithms of SmartStore and Spyglass, two recent systems doing partitioned search in similar environments. We propose a general set of criteria for comparing partitioning algorithms, and use them to evaluate the partitioning algorithms. Our results show that Security Aware Partitioning can provide excellent search performance at a low computational cost to build indexes, O(n). Based on metrics such as information gain, we also conclude that expensive clustering algorithms do not offer enough benefit to make them worth the additional cost in time and memory.
While file system metadata is well characterized by a variety of workload studies, scientific metadata is much less well understood. We characterize scientific metadata, in order to better understand the implications for index design. Based on our findings, existing solutions for either file system or scientific search will not suffice for indexing a large scientific file system.We describe the problems with existing solutions, and suggest column stores as an alternative approach.
While the amount of data we can process and store grows, our ability to find data remains dependent upon our own memories more often than not. Manual metadata management is common among scientific users, consuming their time while not making use of the computing resources at hand. Our system design proposes to empower users with more powerful data finding tools, such as unified search spaces, provenance, and ranked file system search. By returning the responsibility of file management to the file system, we enable scientists to focus on their science without the need for a customized file organization scheme for their work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.