As ever-larger cohorts of human genomes are collected in pursuit of genotype/phenotype associations, sequencing informatics must scale up to yield complete and accurate genotypes from vast raw datasets. Joint variant calling, a data processing step entailing simultaneous analysis of all participants sequenced, exhibits this scaling challenge acutely. We present GLnexus (GL, Genotype Likelihood), a system for joint variant calling designed to scale up to the largest foreseeable human cohorts. GLnexus combines scalable joint calling algorithms with a persistent database that grows efficiently as additional participants are sequenced. We validate GLnexus using 50,000 exomes to show it produces comparable or better results than existing methods, at a fraction of the computational cost with better scaling. We provide a standalone opensource version of GLnexus and a DNAnexus cloudnative deployment supporting very large projects, which has been employed for cohorts of >240,000 exomes and >22,000 whole-genomes.
zFS is a research project aimed at building a decentralized file system that distributes all aspects of file and storage management over a set of cooperating machines interconnected by a high-speed network. zFS is designed to be a file system that scales from a few networked computers to several thousand machines and to be built from commodity off-the-shelf components.The two most prominent features of zFS are its cooperative cache and distributed transactions. zFS integrates the memory of all participating machines into one coherent cache. Thus, instead of going to the disk for a block of data already in one of the machine memories, zFS retrieves the data block from the remote machine. zFS also uses distributed transactions and leases, instead of groupcommunication and clustering software.This article describes the zFS high-level architecture and how its goals are achieved.
B-trees are used by many file systems to represent files and directories. They provide guaranteed logarithmic time key-search, insert, and remove. File systems like WAFL and ZFS use shadowing, or copy-on-write, to implement snapshots, crash recovery, write-batching, and RAID. Serious difficulties arise when trying to use b-trees and shadowing in a single system.
This article is about a set of b-tree algorithms that respects shadowing, achieves good concurrency, and implements cloning (writeable snapshots). Our cloning algorithm is efficient and allows the creation of a large number of clones.
We believe that using our b-trees would allow shadowing file systems to better scale their on-disk data structures.
Abstract-The Horus and Ensemble efforts culminated a multi-year Cornell research program in process group communication used for fault-tolerance, security and adaptation. Our intent was to understand the degree to which a single system could offer flexibility and yet maintain high performance, to explore the integration of fault-tolerance with security and real-time mechanisms, and to increase trustworthiness of our solutions by applying formal methods. Here, we summarize the accomplishments of the effort and evaluate the successes and failures of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.