Abstract-Data-intensive scientific workflows exhibit inter-task dependencies that generate file-based communication schemes. In such scenarios, traditional disk-based storage systems often limit overall application performance and scalability. To overcome the storage bottleneck, in-memory runtime distributed file systems speed up application I/O. Such systems are deployed statically onto a fixed number of compute nodes and act as a distributed, fast I/O cache for the runtime generated data. Such static deployment schemes have two major drawbacks. First, the user is faced with the sometimes difficult task of estimating the size of the generated data, as the application would fail otherwise. Second, because applications exhibit significant variability of the data footprint and of the achieved parallelism during their runtime, this deployment scheme also leads to severe resource underutilization. To address these limitations, we present MemEFS, an elastic in-memory runtime distributed file system. MemEFS is able to scale elastically, based on application storage demands, by acquiring or releasing resources when needed. Our evaluation shows that, while generating modest runtime overheads, MemEFS is able to increase the resource utilization efficiency by up to 65%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.