Serverless network file systems

Anderson, Thomas E.; Dahlin, Mike; Neefe, Jeanna M.; Patterson, David A.; Roselli, Drew; Wang, Randolph Y.

doi:10.1145/225535.225537

Cited by 214 publications

(178 citation statements)

References 19 publications

Supporting

Mentioning

177

Contrasting

Unclassified

Order By: Relevance

“…Swift [9], Zebra [25] and xFS [1] employ RAID-4/5 to improve redundancy. Swift conducts file stripping so that large files benefit from access parallelism.…”

Section: Related Workmentioning

confidence: 99%

CEFT: A cost-effective, fault-tolerant parallel virtual file system

Zhu

Jiang

2006

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

The vulnerability of computer nodes due to component failures is a critical issue for cluster-based file systems. This paper studies the development and deployment of mirroring in cluster-based parallel virtual file systems to provide fault tolerance and analyzes the tradeoffs between the performance and the reliability in the mirroring scheme. It presents the design and implementation of CEFT, a scalable RAID-10 style file system based on PVFS, and proposes four novel mirroring protocols depending on whether the mirroring operations are server-driven or client-driven, whether they are asynchronous or synchronous. The comparisons of their write performances, measured in a real cluster, and their reliability and availability, obtained through analytical modeling, show that these protocols strike different tradeoffs between the reliability and performance. Protocols with higher peak write performance are less reliable than those with lower peak write performance, and vice versa. A hybrid protocol is proposed to optimize this tradeoff.

show abstract

“…Swift [9], Zebra [25] and xFS [1] employ RAID-4/5 to improve redundancy. Swift conducts file stripping so that large files benefit from access parallelism.…”

Section: Related Workmentioning

confidence: 99%

CEFT: A cost-effective, fault-tolerant parallel virtual file system

Zhu

Jiang

2006

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

show abstract

“…For example, the Lightweight File Systems [26] project at Sandia Labs has stripped down POSIX semantics to a core of authentication and authorization affording layering of other semantics, like consistency, on an as-needed basis. Other file systems like the Serverless File System [3] have distributed metadata weakening the immediate consistency across the entire network of machines. NFS [24] relies on write-back local caches limiting the globally consistent view of the file system to the last synchronization operation.…”

Section: Related Workmentioning

confidence: 99%

Advanced I/O for large-scale scientific applications.

Klasky

Schwan

Oldfield³

et al. 2010

View full text Add to dashboard Cite

As scientific simulations scale to use petascale machines and beyond, the data volumes generated pose a dual problem. First, with increasing machine sizes, the careful tuning of IO routines becomes more and more important to keep the time spent in IO acceptable. It is not uncommon, for instance, to have 20% of an application's runtime spent performing IO in a 'tuned' system. Careful management of the IO routines can move that to 5% or even less in some cases. Second, the data volumes are so large, on the order of 10s to 100s of TB, that trying to discover the scientifically valid contributions requires assistance at runtime to both organize and annotate the data. Waiting for offline processing is not feasible due both to the impact on the IO system and the time required. To reduce this load and improve the ability of scientists to use the large amounts of data being produced, new techniques for data management are required. First, there is a need for techniques for efficient movement of data from the 3 compute space to storage. These techniques should understand the underlying system infrastructure and adapt to changing system conditions. Technologies include aggregation networks, data staging nodes for a closer parity to the IO subsystem, and autonomic IO routines that can detect system bottlenecks and choose different approaches, such as splitting the output into multiple targets, staggering output processes. Such methods must be end-to-end, meaning that even with properly managed asynchronous techniques, it is still essential to properly manage the later synchronous interaction with the storage system to maintain acceptable performance. Second, for the data being generated, annotations and other metadata must be incorporated to help the scientist understand output data for the simulation run as a whole, to select data and data features without concern for what files or other storage technologies were employed. All of these features should be attained while maintaining a simple deployment for the science code and eliminating the need for allocation of additional computational resources.4 Acknowledgment

show abstract

“…To our knowledge, only few distributed file systems have been designed with fully symmetric constraints [4,5]. The implementation complexity of such systems is generally dissuasive and the current trend consists in setting up distributed file systems composed by one or two meta-data servers and several I/O servers [6,7].…”

Section: Kernel Distributed File Systemmentioning

confidence: 99%

Reducing Kernel Development Complexity in Distributed Environments

Lèbre

Lottiaux²,

Focht

et al. 2008

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Setting up generic and fully transparent distributed services for clusters implies complex and tedious kernel developments. More flexible approaches such as user-space libraries are usually preferred with the drawback of requiring application recompilation. A second approach consists in using specific kernel modules (such as FUSE in Gnu/Linux system) to transfer kernel complexity into user space. In this paper, we present a new way to develop kernel distributed services for clusters by using a cluster wide consistent data management service. This system, entitled kDDM for "kernel Distributed Data Management", offers flexible kernel mechanisms to transparently manage remote accesses, cache and coherency. We show how kDDM simplifies distributed kernel developments by presenting the design and the implementation of a service as complex as a fully symmetric distributed file system. The innovative approach of kDDM has the potential to boost the development of distributed kernel services because it relieves the developers of the burden of dealing with distributed protocols and explicit data transfers.

show abstract

Serverless network file systems

Cited by 214 publications

References 19 publications

CEFT: A cost-effective, fault-tolerant parallel virtual file system

CEFT: A cost-effective, fault-tolerant parallel virtual file system

Advanced I/O for large-scale scientific applications.

Reducing Kernel Development Complexity in Distributed Environments

Contact Info

Product

Resources

About