Linkage disequilibrium maps to guide contig ordering for genome assembly

Pengelly, Reuben J.; Collins, Andrew

doi:10.1093/bioinformatics/bty687

Cited by 6 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The profound problem is that the physical connectivity is lost during sequencing and recovering that in the assembly stage is notoriously difficult. To this end, high‐quality linkage maps are valuable and allow inferring the physical order and orientation of the assembled contigs (Pengelly & Collins, 2019; Rastas, 2020; Stemple, 2013).…”

Section: Introductionmentioning

confidence: 99%

Automated improvement of stickleback reference genome assemblies with Lep‐Anchor software

Kivikoski

Rastas

Löytynoja

et al. 2021

Molecular Ecology Resources

View full text Add to dashboard Cite

Great deal of present-day research in biology is based on genomic data that are processed and analysed in the context of a linear reference genome. Typical examples of this are whole-genome sequencing studies where sequencing reads are mapped to the reference genome and the characteristics of interest are derived from local dissimilarities and statistics based on the alignments (Korneliussen et al., 2014;Schraiber & Akey, 2015). Reliability of those characteristics and the conclusions drawn from them depend not only on the quality of the sequencing data but also on the quality of the reference genome. Assembling and evaluating the quality of reference genomes is not easy (Baker, 2012;

show abstract

Section: Introductionmentioning

confidence: 99%

Automated improvement of stickleback reference genome assemblies with Lep‐Anchor software

Kivikoski

Rastas

Löytynoja

et al. 2021

Molecular Ecology Resources

View full text Add to dashboard Cite

show abstract

“…Linkage disequilibrium maps [39,43] are constructed from population data but are closely analogous to the genetic linkage map because LD structure is determined to a large degree by accumulated recombination events. Pengelly and Collins [44] describe a method for ordering, orienting and positioning sequenced contigs using LD maps. The maps are constructed using SNP genotype data from unrelated individuals.…”

Section: Ordering and Orientation By Linkage Disequilibriummentioning

confidence: 99%

The Challenge of Genome Sequence Assembly

Collins¹

2018

TOBIOIJ

Self Cite

View full text Add to dashboard Cite

Background: Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs. Objective: Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs. Results and Conclusion: A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.

show abstract

“…Programs like LDMAP (Pengelly and Collins 2019) and LDscaff (Zhao et al 2020) were developed to address this task based on allele frequencies. In this study, we infer haplotype blocks based on the comparison of variation patterns without taking allele frequencies into account and demonstrate that this approach leads to substantial improvements in the contiguity of the quinoa genome assembly.…”

Section: Introductionmentioning

confidence: 99%

Quinoa genome assembly employing genomic variation for guided scaffolding

et al. 2021

View full text Add to dashboard Cite

Key message We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. Abstract Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules.

show abstract

Linkage disequilibrium maps to guide contig ordering for genome assembly

Abstract: Supplementary data are available at Bioinformatics online.

Cited by 6 publications

References 26 publications

Automated improvement of stickleback reference genome assemblies with Lep‐Anchor software

Automated improvement of stickleback reference genome assemblies with Lep‐Anchor software

The Challenge of Genome Sequence Assembly

Quinoa genome assembly employing genomic variation for guided scaffolding

Contact Info

Product

Resources

About