Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.
MotivationOptical mapping is a technique for capturing fluorescent signal patterns of long DNA molecules (in the range of 0.1–1 Mbp). Recently, it has been complementing the widely used short-read sequencing technology by assisting with scaffolding and detecting large and complex structural variations (SVs). Here, we introduce a fast, robust and accurate tool called OMBlast for aligning optical maps, the set of signal locations on the molecules generated from optical mapping. Our method is based on the seed-and-extend approach from sequence alignment, with modifications specific to optical mapping.ResultsExperiments with both synthetic and our real data demonstrate that OMBlast has higher accuracy and faster mapping speed than existing alignment methods. Our tool also shows significant improvement when aligning data with SVs.Availability and ImplementationOMBlast is implemented for Java 1.7 and is released under a GPL license. OMBlast can be downloaded from https://github.com/aldenleung/OMBlast and run directly on machines equipped with a Java virtual machine.Supplementary information Supplementary data are available at Bioinformatics online
We present a new method, OMSV, for accurately and comprehensively identifying structural variations (SVs) from optical maps. OMSV detects both homozygous and heterozygous SVs, SVs of various types and sizes, and SVs with or without creating or destroying restriction sites. We show that OMSV has high sensitivity and specificity, with clear performance gains over the latest method. Applying OMSV to a human cell line, we identified hundreds of SVs >2 kbp, with 68 % of them missed by sequencing-based callers. Independent experimental validation confirmed the high accuracy of these SVs. The OMSV software is available at http://yiplab.cse.cuhk.edu.hk/omsv/.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1356-2) contains supplementary material, which is available to authorized users.
Human genomes contain structural variations (SVs) that are associated with various phenotypic variations and diseases. SV detection by sequencing is incomplete due to limited read length.Nanochannel-based optical mapping (OM) allows direct observation of SVs up to hundreds of kilobases in size on individual DNA molecules, making it a promising alternative technology for identifying large SVs. SV detection from optical maps is non-trivial due to complex types of error present in OM data, and no existing methods can simultaneously handle all these complex errors and the wide spectrum of SV types. Here we present a novel method, OMSV, for accurate and comprehensive identification of SVs from optical maps. OMSV detects both homozygous and heterozygous SVs, SVs of various types and sizes, and SVs with and without creating/destroying restriction sites. In an extensive series of tests based on real and simulated data, OMSV achieved both high sensitivity and specificity, with clear performance gains over the latest existing method. Applying OMSV to a human cell line, we identified hundreds of SVs >2kbp, with 65% of them missed by sequencing-based callers. Independent experimental validations confirmed the high accuracy of these SVs. We also demonstrate how OMSV can incorporate sequencing data to determine precise SV break points and novel sequences in the SVs not contained in the reference. We provide OMSV as open-source software to facilitate systematic studies of large SVs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.