Motivation
With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems.
Results
We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix.
Availability
http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse
Abstract. Intermittent failures and nondeterministic behavior complicate and compromise the effectiveness of software testing and debugging. To increase the observability of software faults, we explore the effect hardware configurations and processor load have on intermittent failures and the nondeterministic behavior of software systems. We conducted a case study on Mozilla Firefox with a selected set of reported field failures. We replicated the conditions that caused the reported failures ten times on each of nine hardware configurations by varying processor speed, memory, hard drive capacity, and processor load. Using several observability tools, we found that hardware configurations that had less processor speed and memory observed more failures than others. Our results also show that by manipulating processor load, we can influence the observability of some faults.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.