Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Karst, Søren Michael; Ziels, Ryan M.; Kirkegaard, Rasmus Hansen; Albertsen, Mads

doi:10.1101/645903

Cited by 55 publications

(57 citation statements)

References 82 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Emerging singlemolecule (third-generation/long-read) sequencing technologies including Pacific Biosciences ("PacBio") and Oxford Nanopore Technologies ("nanopore") produce reads 10s to 100s of kilobases in length, often without prior amplification. Recent work [14][15][16][17] has demonstrated the utility of these longer reads for sequencing the entire 16S gene or entire rRNA operon, with corresponding increase in taxonomic resolution by capturing more variable sequence, including all nine variable regions of 16S, the internal transcribed spacers (ITS), and 23S. To support the extension of these approaches to characterize strain-level variation of tissue-associated (mucosaassociated) microbiota contributing to IBD and in models of experimental colitis in mice, with a particular focus on mechanistic studies of AIEC, we produced complete genome assemblies for eight AIEC and non-AIEC E. coli, describe the genomic variation among these strains, particularly within the rRNA operon, and demonstrate the accurate identification of these strains in mixed in vitro and in vivo microbiota.…”

Section: Introductionmentioning

confidence: 99%

Long-read sequencing to interrogate strain-level variation among adherent-invasiveEscherichia coliisolated from human intestinal tissue

Wang

Bleich

Zarmer

et al. 2020

Preprint

View full text Add to dashboard Cite

Adherent-invasive Escherichia coli (AIEC) are a pathovar linked to inflammatory bowel diseases (IBD), especially Crohn's disease, and colorectal cancer. AIEC have no known molecular or genomic markers, but instead are defined by in vitro functional attributes.Futhermore, it is unknown if strains classified as AIEC truly colonize intestinal tissues better than non-AIEC strains. To evaluate strain-level variation among tissue-associated E. coli, we must develop a sequencing approach capable of long reads and with the ability to exclude mammalian DNA. We also must evaluate genomic variation among strains that have demonstrated ability to colonize intestinal tissues. Here we have assembled complete genomes using ultra-long-read nanopore sequencing for a model AIEC strain, NC101, and seven strains isolated from the intestinal mucosa of Crohn's disease and non-Crohn's tissues. We show these strains can colonize the intestinal tissue in a Crohn's disease mouse model and induce varying levels of inflammatory cytokines from cultured macrophages. We demonstrate these strains can be quantified and distinguished in the presence of 99.5% mammalian DNA and from within a fecal population. Analysis of global genomic structure and specific sequence variation within the ribosomal RNA operon provides a framework for efficiently tracking strain-level variation of closely-related E. coli and likely other commensal/pathogenic bacteria impacting intestinal inflammation in mice and IBD patients.

show abstract

Section: Introductionmentioning

confidence: 99%

Long-read sequencing to interrogate strain-level variation among adherent-invasiveEscherichia coliisolated from human intestinal tissue

Wang

Bleich

Zarmer

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Our targeted sequencing protocol can capture long deletions, uses the same amplicon for the whole locus, and allows sample multiplexing. Unique molecule counting methods 48 for long reads 49,50 could be incorporated to reduce PCR biases. Established protocols 28 are available for shorter (100-300 bp) target regions.…”

Section: Discussionmentioning

confidence: 99%

Parallel genetics of regulatory sequencesin vivo

Froehlich

Uyar

Herzog

et al. 2020

Preprint

View full text Add to dashboard Cite

Understanding how regulatory sequences control gene expression is fundamental to explain how phenotypes arise in health and disease. Traditional reporter assays inform about function of individual regulatory elements, typically in isolation. However, regulatory elements must ultimately be understood by perturbing them within their genomic environment and developmental- or tissue-specific contexts. This is technically challenging; therefore, few regulatory elements have been characterized in vivo. Here, we used inducible Cas9 and multiplexed guide RNAs to create hundreds of mutations in enhancers/promoters and 3′ UTRs of 16 genes in C. elegans. To quantify the consequences of mutations on expression, we developed a targeted RNA sequencing strategy across hundreds of mutant animals. We were also able to systematically and quantitatively assign fitness cost to mutations. Finally, we identified and characterized sequence elements that strongly regulate phenotypic traits. Our approach enables highly parallelized, functional analysis of regulatory sequences in vivo.

show abstract

“…Even if databases contained sequences with up to 99% identity to the analysed species, further improvements could often be made by adding closer reference sequences ( Figure 6). When the consensus sequence was constructed, however, taxonomic identification based on the obtained consensus sequence was far less sensitive to database relied on rather laborious wet-lab procedures such as rolling cycle amplification or unique tagging of the individual amplicons before sequencing [25,26]. Unlike previous studies we specifically designed our workflow for clinical routine applications.…”

Section: Discussionmentioning

confidence: 99%

“…One obstacle for a broad adoption of nanopore sequencing in routine diagnostic laboratories is the added bioinformatic complexity as compared to established Sanger sequencing workflows. Furthermore, available workflows are often limited to the analysis of pure amplicons [20][21][22][23], include complex modifications of the ONT laboratory workflows [25,26], or lack published validation by using samples other than mock communities [27,28].…”

Section: Introductionmentioning

confidence: 99%

A sample-to-report solution for taxonomic identification of cultured bacteria in the clinical setting based on nanopore sequencing

Neuenschwander

Miani

Amlang

et al. 2019

Preprint

View full text Add to dashboard Cite

Amplicon sequencing of 16S rRNA gene is commonly used for the identification of bacterial isolates in diagnostic laboratories, and mostly relies on the Sanger sequencing method. The latter, however, suffers from a number of limitations with the most significant being the inability to resolve mixed amplicons when closely related species are co-amplified from a mixed culture. This often leads to either increased turnover time or absence of usable sequence data. Short-read NGS technologies could address the mixed amplicon issue, but would lack both cost efficiency at low throughput and fast turnaround times. Nanopore sequencing developed by Oxford Nanopore Technologies (ONT) could solve those issues by enabling flexible number of samples per run and adjustable sequencing time. Here we report on the development of a standardized laboratory workflow combined with a fully automated analysis pipeline LORCAN (Long Read Consensus ANalysis), which together provide a sample-to-report solution for amplicon sequencing and taxonomic identification of the resulting consensus sequences. Validation of the approach was conducted on a panel of reference strains and on clinical samples consisting of single or mixed rRNA amplicons associated with various bacterial genera by direct comparison to the corresponding Sanger sequences. Additionally, artificial read mixtures of closely related species were used to assess LORCAN's behaviour when dealing with samples with known cross-contamination level. We demonstrate that by combining ONT amplicon sequencing results with LORCAN, the accuracy of Sanger sequencing can be closely matched (>99.6% sequence identity) and that mixed samples can be resolved at the single base resolution level. The presented approach has the potential to significantly improve the flexibility, reliability and availability of amplicon sequencing in diagnostic settings. 3/22

show abstract

Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing

Cited by 55 publications

References 82 publications

Long-read sequencing to interrogate strain-level variation among adherent-invasiveEscherichia coliisolated from human intestinal tissue

Long-read sequencing to interrogate strain-level variation among adherent-invasiveEscherichia coliisolated from human intestinal tissue

Parallel genetics of regulatory sequencesin vivo

A sample-to-report solution for taxonomic identification of cultured bacteria in the clinical setting based on nanopore sequencing

Contact Info

Product

Resources

About