Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2--5% wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.
Developing file systems from scratch is difficult and error prone. Layered, or stackable, file systems are a powerful technique to incrementally extend the functionality of existing file systems on commodity OSes at runtime. In this paper, we analyze the evolution of layering from historical models to what is found in four different present day commodity OSes: Solaris, FreeBSD, Linux, and Microsoft Windows. We classify layered file systems into five types based on their functionality and identify the requirements that each class imposes on the OS. We then present five major design issues that we encountered during our experience of developing over twenty layered file systems on four OSes. We discuss how we have addressed each of these issues on current OSes, and present insights into useful OS and VFS features that would provide future developers more versatile solutions for incremental file system development.
Files or even their names often contain confidential or secret information. Most users believe that such information is erased as soon as they delete a file. Even those who know that this is not true often ignore the issue. Nevertheless, recovering deleted files is trivial and can be performed even by novice hackers. The problem is exacerbated by the widespread of portable and mobile storage devices. This type of unwanted after-deletion data recovery is in part an education problem. Users believe that deleted files are erased, even though they are not. Retraining and educating users is difficult. Therefore, storage systems should behave appropriately-the data should be erased from the storage on a per-delete basis.We found that existing solutions are either inconvenient, inefficient, or insecure. We have designed Purgefs: a file system extension that transparently overwrites files on the per-delete basis. Purgefs can be automatically added to a number of existing and future file systems, including networked and stackable file systems. Purgefs supports multiple policies to trade-off performance with the level of purging guarantees. We demonstrate that Purgefs does not add overheads or perturb users' activity under typical user workloads.
This paper has three goals. (1) We try to debunk several held misconceptions about secure deletion: that encryption is an ideal solution for everybody, that existing data-overwriting tools work well, and that securely deleted files must be overwritten many times. (2) We discuss new and important issues that are often neglected: secure deletion consistency in case of power failures, handling versioning and journalling file systems, and meta-data overwriting. (3) We present two solutions for on-demand secure deletion. First, we have created a highly portable and flexible system that performs only the minimal amount of work in kernel mode. Second, we present two in-kernel solutions in the form of Ext3 file system patches that can perform comprehensive data and meta-data overwriting. We evaluated our proposed solutions and discuss the trade-offs involved.
Modern business information systems are typically multi-tiered distributed systems comprising Web services, application services, databases, enterprise information systems, file systems, storage controllers, and other storage systems. In such environments, data is stored in different forms at multiple tiers, with each tier associated with some level of data abstraction. An information entity owned by an application generally maps to several data entities, logically associated across tiers and related to the application. Discovery of such relationships in a distributed system is a challenging problem, complicated by the widespread adoption of virtualization technologies and by the traditional tendency to manage each tier as an independent domain. In this paper, we present a system and methodology for model-driven discovery of end-to-end application-data relationships spanning multiple tiers, from the applications to the lowest levels of the storage hierarchy. The key to our methodology involves modeling how data is used and transformed by distributed software components. An important benefit of our system, which we call Galapagos, is the ability to reflect business decisions expressed at the application level to the level of storage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.