Recent increases in CPU performance have surpassed those in hard drives. As a result, disk operations have become more expensive in terms of the number of CPU cycles spent waiting for them to complete. File prediction can mitigate this problem by prefetching files into cache before they are accessed. Identifying relationships between individual files plays a key role in successfully performing file prefetching. It is well-known that previous patterns of file references can be used to predict future references. Nevertheless, knowledge about the programs producing the relationships between individual files has rarely been investigated. We present a Program-Based Successor (PBS) model that identifies relationships between files through the names of the programs accessing them. We develop a Program-based Last Successor (PLS) model derived from PBS to do file prediction. Our simulation results show that PLS makes 21 % fewer incorrect predictions and roughly the same number of correct predictions as the Last-Successor (LS) model. We also examine the cache hit ratio achieved by applying PLS to the Least Recently Used (LRU) caching algorithm and show that a cache using PLS and LRU together can perform better than a cache up to 40 times larger using LRU alone. Finally, we argue that because program-based successors are more likely to be used soon, incorrectly prefetched program-based successors are more likely to be used and thus less incorrect than incorrectly prefetched files from non-program-based models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.