We analyze how file systems and modern data-intensive applications react to
fsync
failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability) as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle
fsync
failures. Our findings show that although applications use many failure-handling strategies, none are sufficient:
fsync
failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.
Search engines for storage systems rely on crawlers to gather the list of files that need to be indexed. The recency of an index is determined by the speed at which this list can be gathered. While there has been a substantial amount of literature on building efficient web crawlers, there is very little literature on file system crawlers. In this paper we discuss the challenges in building a file system crawler. We then present the design of two file system crawlers: the first uses the standard POSIX file system API but carefully controls the amount of memory and CPU that it uses. The second leverages modifications to the file systems's internals, and a new API called SnapDiff, to detect modified files rapidly. For both crawlers we describe the incremental differencing design; the method to produce a list of changes between a previous crawl and the current point in time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.