An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads.
No abstract
Programmable Logic Controllers (PLCs) are an established platform, widely used throughout industrial automation but poorly understood among researchers. This paper gives an overview of the state of the practice, explaining why this settled technology persists throughout industry and presenting a critical analysis of the strengths and weaknesses of the dominant programming styles for today's PLC-based automation systems. We describe the software execution patterns that are standardized loosely in IEC 61131-3. We identify opportunities for improvements that would enable increasingly complex industrial automation applications while strengthening safety and reliability. Specifically, we propose deterministic, distributed programming models that embrace explicit timing, event-triggered computation, and improved security.
Genomic variant discovery is frequently performed using the GATK Best Practices variant calling pipeline, a complex workflow with multiple steps, fans/merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Here we describe a wrapper for the GATK-based variant calling workflow using the Swift/T parallel scripting language. Standard built-in features include the flexibility to split by chromosome before variant calling, optionally permitting the analysis to continue when faulty samples are detected, and allowing users to analyze multiple samples in parallel within each cluster node. The use of Swift/T conveys two key advantages: (1) Thanks to the embedded ability of Swift/T to transparently operate in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.,) a single workflow is trivially portable across numerous clusters; (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, conditional on the analyst's choice, which makes the workflow easy to maintain. This modular design permits separation of the workflow into multiple stages and the request of resources optimal for each stage of the pipeline. While Swift/T's implicit data-level parallelism eliminates the need for the developer to code parallel analysis of multiple samples, it does make debugging of the workflow a bit more difficult, as is the case with any implicitly parallel code. With the above features, users have a powerful and portable way to scale up their variant calling analysis to run in many traditional computer cluster architectures.https://github.com/ncsa/Swift-T-Variant-Calling http://swift-t-variant-calling.readthedocs.io/en/latest/ PLOS 1/23 1 Advancements in sequencing technology [1, 2] have paved the way for many applications 2 of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES) in genomic 3 research and the clinic [3, 4]. One of these applications is genomic variant calling, 4 commonly performed in accordance with the Best Practices established by the GATK 5 team (Genome Analysis Toolkit) [5-7]. This methodology involves constructing a 6 complex workflow that could be hard to manage especially for large sample sizes 7 (hundreds and beyond, [8-10]) that necessitate the use of large computer clusters. In 8 such cases, features like resiliency and auto-restart in case of node failures, tracking of 9 individual samples, efficient node utilization, and easy debugging of errors and failures 10 are very important. Without a high-quality workflow manager, these requirements can 11 be difficult to satisfy, resulting in error-prone workflow development, maintenance and 12 execution. An additional challenge is porting the workflow among different computing 13 environments, a common need in collaborative and consortium projects. 14 Monolithic solutions, where a single executable runs the entire analy...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.