Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing

MacConaill, Laura E.; Burns, Robert T.; Nag, Anwesha; Coleman, Haley A.; Slevin, Michael K.; Giorda, Kristina; Light, Madelyn; Lai, Kevin; Jarosz, Mirna; McNeill, Matthew; Ducar, Matthew D.; Meyerson, Matthew; Thorner, Aaron R.

doi:10.1186/s12864-017-4428-5

Cited by 182 publications

(159 citation statements)

References 23 publications

Supporting

Mentioning

154

Contrasting

Order By: Relevance

“…To determine whether BGISEQ-500 sequencing accuracy is affected by index hopping, as occurs with Illumina's sequencers 3,4,[8][9][10][11] , we examined the rate of index mis-assignment in BGISEQ-500 runs. We ligated eight unique single indexes to eight gene regions, respectively (indexes 1-8) ( Supplementary Table 1) or to eight water controls lacking DNA inputs (indexes 33-40), and we pooled equal volumes of all samples after PCR amplification.…”

Section: Index Mis-assignment In Controlsmentioning

confidence: 99%

Reliable Multiplex Sequencing with Rare Index Mis-Assignment on DNB-Based NGS Platform

Zhao

Zhang

et al. 2018

Preprint

View full text Add to dashboard Cite

Accurate next generation sequencing (NGS) is critical for understanding genetic predisposition to human disease and thus aiding clinical diagnosis and personalized precision medicine. Recent breakthroughs in massively parallel sequencing, especially when coupled with sample multiplexing, have driven sequencing cost down and made clinical genetic tests broadly affordable. However, intractable index mis-assignment (commonly exceeds 1%) has been reported on some widely used sequencing platforms. Burdensome unique dual indexing is now used to reduce this problem. Here, we investigated this quality issue on BGI sequencers using three major library preparation methods: whole genome sequencing (WGS) with PCR, PCR-free WGS, and two-step targeted PCR. BGI's sequencers utilize a unique DNA nanoball (DNB) technology that is based on rolling circle replication (RCR) for array preparation; this linear amplification is PCR free and can avoid error accumulation. We demonstrate here that single index mis-assignment from free indexed oligos on these sequencers occurs at a rate of only one in 36 million reads, suggesting virtually no index hopping during DNB creation and arraying, as expected for the RCR process. Furthermore, the DNBbased NGS applications have achieved an unprecedentedly low sample-to-sample misassignment rate of 0.0001% to 0.0004% using only single indexing. Therefore, single indexing with DNB sequencing technology provides a simple but effective method for sensitive research and clinical genetic assays that require the detection of low abundance sequences in a large number of samples.

show abstract

Section: Index Mis-assignment In Controlsmentioning

confidence: 99%

Reliable Multiplex Sequencing with Rare Index Mis-Assignment on DNB-Based NGS Platform

Zhao

Zhang

et al. 2018

Preprint

View full text Add to dashboard Cite

show abstract

“…To address these issues, we performed library barcoding with dual unique indexes. 32 Tru-seq HT style combinatorial indexing using D7xx/D5xx pairs resulted in a median contamination of 0.5% (maximum 8.9%).…”

Section: Ultra-rare Variant Calls Enabled By Qctsmentioning

confidence: 99%

A novel high-throughput molecular counting method with single base-pair resolution enables accurate single-gene NIPT

Tsao

Silas

Landry

et al. 2019

Preprint

View full text Add to dashboard Cite

Next-generation DNA sequencing is currently limited by an inability to count the number of input DNA molecules. Molecular counting is particularly needed when accurate quantification is required for diagnostic purposes, such as in single-gene non-invasive prenatal testing (sgNIPT) and liquid biopsy. We developed Quantitative Counting Template (QCT) molecular counting for reconstructing the number of input DNA molecules using sequencing data. We then used QCT molecular counting to develop sgNIPT of sickle cell disease, cystic fibrosis, spinal muscular atrophy, alpha-thalassemia, and beta-thalassemia. Incorporating molecular count information into a statistical model of disease likelihood led to analytical sensitivity and specificity of >98% and >99%, respectively. Validation of sgNIPT was further performed with maternal blood samples collected during pregnancy, and sgNIPT was 100% concordant with newborn follow-up. 15/20

show abstract

“…However, as several studies have recently shown (Sinha et al, 2017;Costello et al, 2018), multiplexing leads to incorrect sample assignment of a significant fraction of demultiplexed sequencing reads. Out of several mechanisms that can introduce sample index missassignment (MacConaill et al, 2018), the presence of free-floating indexing primers that attach to the pooled cDNA fragments just before the exclusion amplification step in patterned sequencing flowcells has been shown to be the main culprit (Illumina, 2017). This phenomenon is known as sample index hopping and results in a data cross-contamination artifact that takes the form of phantom molecules, molecules that exist only in the data by virtue of read misassignment (Figure 1a).…”

Section: Précismentioning

confidence: 99%

“…Unfortunately, sample multiplexing can cause a significant percentage of the demultiplexed sequenced reads to be misassigned to an incorrect sample barcode. Although sample read misassignments can arise due to several factors (MacConaill et al, 2018), one specific mechanism termed sample index hopping is the primary cause of read misassignments in patterned flow cells. Index hopping is believed to result from the presence of free-floating indexing primers that attach to the pooled cDNA fragments just before the exclusion amplification step that generates clusters on the flow cell.…”

Section: Overview Of Sample Index Hopping On Illumina's Sequencersmentioning

confidence: 99%

Statistical modeling, estimation, and remediation of sample index hopping in multiplexed droplet-based single-cell RNA-seq data

Farouni

Djambazian

Ragoussis

et al. 2019

Preprint

View full text Add to dashboard Cite

Sample index hopping can substantially confound the analysis of multiplexed sequencing data due to the resulting erroneous assignment of some, or even all, of the sequencing reads generated by a cDNA fragment in a given sample to other samples. In those target samples, the data cross-contamination artifact takes the form of "phantom molecules", molecules that exist only in the data by virtue of read misassignment. The presence of phantom molecules in droplet-based single-cell RNA-seq data should be a cause of great concern since they can introduce both phantom cells and artifactual differentially-expressed genes in downstream analyses. More importantly, even when the index hopping rate is very small, the fraction of phantom molecules in the entire dataset can be high due to the distributional properties of sequencing reads across samples. To our knowledge, current computational methods are unable to accurately estimate the underlying rate of index hopping nor adequately correct for the resultant misassignment in droplet-based single cell RNA-seq data. Here, we introduce a probabilistic model that formalizes the phenomenon of index hopping and allows the accurate estimation of its rate. Application of the proposed model to several multiplexed datasets suggests that the sample index hopping probability for a given read ranges between 0.003 to 0.009, arguable low numbers, even though, counter-intuitively, they can give rise to a large fraction of phantom molecules -as high as 85% -in any given sample. We also present a model-based approach for inferring the true sample of origin of the reads that are affected by index hopping, thus allowing the purging of the majority of phantom molecules in the data. Using empirical and simulated data, we show that we can reassign reads to their true sample of origin and remove predicted phantom molecules through a principled probabilistic procedure that optimally minimizes the false positive rate. Thus, even though sample index hopping often substantially compromises single-cell RNA-seq data, it is possible to accurately quantify, detect, and reassign the affected reads and remove the phantom molecules generated by index hopping. Code and reproducible analysis notebooks are available at https://github.com/csglab/phantom_purge.

show abstract

Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing

Cited by 182 publications

References 23 publications

Reliable Multiplex Sequencing with Rare Index Mis-Assignment on DNB-Based NGS Platform

Reliable Multiplex Sequencing with Rare Index Mis-Assignment on DNB-Based NGS Platform

A novel high-throughput molecular counting method with single base-pair resolution enables accurate single-gene NIPT

Statistical modeling, estimation, and remediation of sample index hopping in multiplexed droplet-based single-cell RNA-seq data

Contact Info

Product

Resources

About