BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing

Kao, Wei-Chun; Stevens, Kristian; Song, Yun S.

doi:10.1101/gr.095299.109

Cited by 82 publications

(103 citation statements)

References 10 publications

Supporting

Mentioning

103

Contrasting

Order By: Relevance

“…The bias in the distributions of fluorescence intensities appears in later sequencing cycles, which can be alleviated by an intensity normalization [171]. A number of improved base callers have been developed to reduce the error rate for each platform, including Rsolid [171] for the SOLiD platform, Pyrobayes [172] for the 454 platform, and BayesCall [173], Ibis [174], Seraphim [175], and AYB [176] for the Illumina platform. Base-calling algorithms use quality scores to estimate error probabilities for each base call, most of which can be transformed to Phred quality score (Q) [177].…”

Section: Bioinformatics Challenges and Solutionsmentioning

confidence: 99%

Next-generation sequencing in the clinic: Promises and challenges

et al. 2013

View full text Add to dashboard Cite

The advent of next generation sequencing (NGS) technologies has revolutionized the field of genomics, enabling fast and cost-effective generation of genome-scale sequence data with exquisite resolution and accuracy. Over the past years, rapid technological advances led by academic institutions and companies have continued to broaden NGS applications from research to the clinic. A recent crop of discoveries have highlighted the medical impact of NGS technologies on Mendelian and complex diseases, particularly cancer. However, the everincreasing pace of NGS adoption presents enormous challenges in terms of data processing, storage, management and interpretation as well as sequencing quality control, which hinder the translation from sequence data into clinical practice. In this review, we first summarize the technical characteristics and performance of current NGS platforms. We further highlight advances in the applications of NGS technologies towards the development of clinical diagnostics and therapeutics. Common issues in NGS workflows are also discussed to guide the selection of NGS platforms and pipelines for specific research purposes.

show abstract

Section: Bioinformatics Challenges and Solutionsmentioning

confidence: 99%

Next-generation sequencing in the clinic: Promises and challenges

et al. 2013

View full text Add to dashboard Cite

show abstract

“…Base calling was done using Illumina Bustard (Kao et al 2009) and quality control with FastQC (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/).…”

Section: Dna Samples Sequencing and Data Processingmentioning

confidence: 99%

Great ape Y Chromosome and mitochondrial DNA phylogenies reflect subspecies structure and patterns of mating and dispersal

et al. 2016

View full text Add to dashboard Cite

The distribution of genetic diversity in great ape species is likely to have been affected by patterns of dispersal and mating. This has previously been investigated by sequencing autosomal and mitochondrial DNA (mtDNA), but large-scale sequence analysis of the male-specific region of the Y Chromosome (MSY) has not yet been undertaken. Here, we use the human MSY reference sequence as a basis for sequence capture and read mapping in 19 great ape males, combining the data with sequences extracted from the published whole genomes of 24 additional males to yield a total sample of 19 chimpanzees, four bonobos, 14 gorillas, and six orangutans, in which interpretable MSY sequence ranges from 2.61 to 3.80 Mb. This analysis reveals thousands of novel MSY variants and defines unbiased phylogenies. We compare these with mtDNA-based trees in the same individuals, estimating time-to-most-recent common ancestor (TMRCA) for key nodes in both cases. The two loci show high topological concordance and are consistent with accepted (sub)species definitions, but time depths differ enormously between loci and (sub)species, likely reflecting different dispersal and mating patterns. Gorillas and chimpanzees/bonobos present generally low and high MSY diversity, respectively, reflecting polygyny versus multimale-multifemale mating. However, particularly marked differences exist among chimpanzee subspecies: The western chimpanzee MSY phylogeny has a TMRCA of only 13.2 (10.8-15.8) thousand years, but that for central chimpanzees exceeds 1 million years. Cross-species comparison within a single MSY phylogeny emphasizes the low human diversity, and reveals speciesspecific branch length variation that may reflect differences in long-term generation times.

show abstract

“…There are two main approaches to addressing this challenge: (1) One approach is to develop improved image analysis and base-calling algorithms. This line of work has been pursued by several researchers in the past, including ourselves (for review, see Erlich et al 2008;Rougemont et al 2008;Kao et al 2009;Kircher et al 2009;Whiteford et al 2009; Kao and Song 2011). Indeed, by using more sophisticated statistical methods, it has been demonstrated that it is possible to deliver significant improvements over the tools developed by the manufacturers of the sequencing platforms.…”

mentioning

confidence: 99%

ECHO: A reference-free short-read error correction algorithm

Kao

Chan²,

Song

2011

Genome Res.

Self Cite

View full text Add to dashboard Cite

Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a wholegenome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth.[Supplemental material is available for this article. ECHO is publicly available at http://uc-echo.sourceforge.net under the Berkeley Software Distribution License.]Over the past few years, next-generation sequencing (NGS) technologies have introduced a rapidly growing wave of information in biological sciences; see Metzker (2010) for a recent review of NGS platforms and their applications. Exploiting massive parallelization, NGS platforms generate high-throughput data at very low cost per base. An important computational challenge associated with this rapid technological advancement is to develop efficient algorithms to extract accurate sequence information. In comparison with traditional Sanger sequencing (Sanger et al. 1977), NGS data have shorter read lengths and higher error rates, and these characteristics create many challenges for computation, especially when a reference genome is not available. Reducing the error rate of base-calls and improving the accuracy of base-specific quality scores have important practical implications for assembly (Sundquist et al.

show abstract

BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing

Cited by 82 publications

References 10 publications

Next-generation sequencing in the clinic: Promises and challenges

Next-generation sequencing in the clinic: Promises and challenges

Great ape Y Chromosome and mitochondrial DNA phylogenies reflect subspecies structure and patterns of mating and dispersal

ECHO: A reference-free short-read error correction algorithm

Contact Info

Product

Resources

About