-This paper evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing storage systems unable to harness all optimization opportunities as this often requires conflicting optimization options or even conflicting design decision at the level of the storage system. Second, when scheduling, workflow runtime engines make suboptimal decisions as they lack detailed data location information. This paper discusses the feasibility, and evaluates the potential performance benefits brought by, building a workflow-aware storage system that supports per-file access optimizations and exposes data location. To this end, this paper presents approaches to determine the application-specific data access patterns, and evaluates experimentally the performance gains of a workflowaware storage approach. Our evaluation using synthetic benchmarks shows that a workflow-aware storage system can bring significant performance gains: up to 7x performance gain compared to the distributed storage system -MosaStore and up to 16x compared to a central, well provisioned, NFS server.
International audienceThis article evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing generic storage systems unable to harness all optimization opportunities as this often requires enabling conflicting optimizations or even conflicting design decisions at the storage system level. Second, most workflow runtime engines make suboptimal scheduling decisions as they lack the detailed data location information that is generally hidden by the storage system. This paper presents a limit study that evaluates the potential gains from building a workflow-aware storage system that supports per-file access optimizations and exposes data location. Our evaluation using synthetic benchmarks and real applications shows that a workflow-aware storage system can bring significant performance gains: up to 3x performance gains compared to a vanilla distributed storage system deployed on the same resources yet unaware of the possible file-level optimizations
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.