Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (∼100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/ genome-biology/crd. T he high-quality assembly of a genome sequence is a critical foundation for understanding the biology of an organism, the genetic variation within a species, or the pathology of a tumor. High-quality assembly is particularly challenging for large, repeatrich genomes such as those of mammals. Among mammals, "finished" genome sequences have been completed for the human and the mouse (1, 2). However, for most large genomes, efforts have focused on using shotgun-sequencing data to produce highquality draft genome assemblies-with long-range contiguity in the range of 20-100 kb and long-range connectivity in the range of 10 Mb (e.g., refs. 3-5). Using traditional capillary-based sequencing, such assemblies have been produced for multiple mammals at a cost of tens of million dollars each.Recently, there has been a revolution in DNA sequencing technology. New massively parallel technologies can produce DNA sequence information at a per-base cost that is ∼100,000-fold lower than a decade ago (6, 7). In principle, this should make it possible to dramatically decrease the cost of generating highquality draft genome assemblies. In practice, however, this has been difficult because the new technology produces sequencing "reads" of only ∼100 bases in length (compared with >700 bases for capillary-based technology). These shorter reads are also less accurate. For both of these reasons, these data are more difficult to assemble into long contiguous and connected sequence. Excellent de novo assemblies using massively parallel sequence data have been reported for microbes with genomes up to 40 Mb (refs. 8-10 and many others). There have been some important pioneering e...
Influenza virus remains a constant public health threat, owing to its ability to evade immune surveillance through rapid genetic drift and reassortment. Monoclonal antibody (mAb)-based immunotherapy is a promising strategy for disease control. Here we use a human Ab phage display library and H5 hemagglutinin (HA) ectodomain to select ten neutralizing mAbs (nAbs) with a remarkably broad range among Group 1 influenza viruses, including the H5N1 “bird flu” and the H1N1 “Spanish flu” strains. Notably, nine of the Abs utilize the same germline gene, VH1-69. The crystal structure of one mAb bound to H5N1 HA reveals that only the heavy chain inserts into a highly conserved pocket in the HA stem, inhibiting the conformational changes required for membrane fusion. Our studies indicate that nAbs targeting this pocket could provide broad protection against both seasonal and pandemic influenza A infections.
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.
Recurrent mutations in the spliceosome are observed in several human cancers, but their functional and therapeutic significance remains elusive. SF3B1, the most frequently mutated component of the spliceosome in cancer, is involved in the recognition of the branch point sequence (BPS) during selection of the 3' splice site (ss) in RNA splicing. Here, we report that common and tumor-specific splicing aberrations are induced by SF3B1 mutations and establish aberrant 3' ss selection as the most frequent splicing defect. Strikingly, mutant SF3B1 utilizes a BPS that differs from that used by wild-type SF3B1 and requires the canonical 3' ss to enable aberrant splicing during the second step. Approximately 50% of the aberrantly spliced mRNAs are subjected to nonsense-mediated decay resulting in downregulation of gene and protein expression. These findings ascribe functional significance to the consequences of SF3B1 mutations in cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.