Michael B. Hall scite author profile

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

show abstract

Sustainable data analysis with Snakemake

Mölder

et al. 2021

View full text Add to dashboard Cite

show abstract

Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe

Hunt¹,

Bradley²,

Lapierre³

et al. 2019

Wellcome Open Res

135

144

View full text Add to dashboard Cite

Two billion people are infected with , leading to Mycobacterium tuberculosis 10 million new cases of active tuberculosis and 1.5 million deaths annually. Universal access to drug susceptibility testing (DST) has become a World Health Organization priority. We previously developed a software tool, , which provided offline species identification and drug Mykrobe predictor resistance predictions for from whole genome sequencing M. tuberculosis (WGS) data. Performance was insufficient to support the use of WGS as an alternative to conventional phenotype-based DST, due to mutation catalogue limitations.

show abstract

Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

et al. 2018

View full text Add to dashboard Cite

Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

show abstract

Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning

Teng

Hall

Duarte

et al. 2017

Preprint

View full text Add to dashboard Cite

Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report the first deep learning model, named Chiron, that can directly translate the raw signal to DNA sequence without the error-prone segmentation step. We show that our model provides state-of-the-art basecalling accuracy when trained with only a small set of 4000 reads. Chiron achieves basecalling speeds of over 2000 bases per second using desktop computer graphics processing units, making it competitive with other deep-learning basecalling algorithms.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael B. Hall

Sustainable data analysis with Snakemake

Sustainable data analysis with Snakemake

Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe

Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning

Contact Info

Product

Resources

About