The bacterium Escherichia coli O157:H7 is a worldwide threat to public health and has been implicated in many outbreaks of haemorrhagic colitis, some of which included fatalities caused by haemolytic uraemic syndrome. Close to 75,000 cases of O157:H7 infection are now estimated to occur annually in the United States. The severity of disease, the lack of effective treatment and the potential for large-scale outbreaks from contaminated food supplies have propelled intensive research on the pathogenesis and detection of E. coli O157:H7 (ref. 4). Here we have sequenced the genome of E. coli O157:H7 to identify candidate genes responsible for pathogenesis, to develop better methods of strain detection and to advance our understanding of the evolution of E. coli, through comparison with the genome of the non-pathogenic laboratory strain E. coli K-12 (ref. 5). We find that lateral gene transfer is far more extensive than previously anticipated. In fact, 1,387 new genes encoded in strain-specific clusters of diverse sizes were found in O157:H7. These include candidate virulence factors, alternative metabolic capacities, several prophages and other new functions--all of which could be targets for surveillance.
The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.DNA sequencing and, more recently, massively parallel DNA sequencing 1-4 has had a profound impact on research and medicine. The reductions in cost and time for generating DNA sequence have resulted in a range of new sequencing applications in cancer 5,6 , human genetics 7 , infectious diseases 8 and the study of personal genomes 9-11 , as well as in fields as diverse as ecology 12,13 and the study of ancient DNA 14,15 . Although de novo sequencing costs have dropped substantially, there is a desire to continue to drop the cost of sequencing at an exponential rate consistent with the semiconductor industry's Moore's Law 16 as well as to provide lower cost, faster and more portable devices. This has been operationalized by the desire to reach the $1,000 genome 17 .To date, DNA sequencing has been limited by its requirement for imaging technology, electromagnetic intermediates (either X-rays 18 , or light 19 ) and specialized nucleotides or other reagents 20 . To overcome these limitations and further democratize the practice of sequencing, a paradigm shift based on non-optical sequencing on newly developed integrated circuits was pursued. Owing to its scalability and its low power requirement, CMOS processes are dominant in modern integrated circuit manufacturing 21 . The ubiquitous nature of computers, digital cameras and mobile phones has been made possible by the low-cost production of integrated circuits in CMOS.Leveraging advances in the imaging field-which has produced large, fast arrays for photonic imaging 22 -we sought a suitable electronic sensor for the construction of an integrated circuit to detect the hydrogen ions that would be released by DNA polymerase 23 during sequencing by synthesis, as opposed to a sensor designed for the detection of photons. Although a variety ...
We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ;183 haploid coverage of aligned sequence and close to 3003 clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.[Supplemental material is available online at
Bisulfite sequencing detects 5mC and 5hmC at single-base resolution. However, bisulfite treatment damages DNA, which results in fragmentation, DNA loss, and biased sequencing data. To overcome these problems, enzymatic methyl-seq (EM-seq) was developed. This method detects 5mC and 5hmC using two sets of enzymatic reactions. In the first reaction, TET2 and T4-BGT convert 5mC and 5hmC into products that cannot be deaminated by APOBEC3A. In the second reaction, APOBEC3A deaminates unmodified cytosines by converting them to uracils. Therefore, these three enzymes enable the identification of 5mC and 5hmC. EM-seq libraries were compared with bisulfite-converted DNA, and each library type was ligated to Illumina adaptors before conversion. Libraries were made using NA12878 genomic DNA, cell-free DNA, and FFPE DNA over a range of DNA inputs. The 5mC and 5hmC detected in EM-seq libraries were similar to those of bisulfite libraries. However, libraries made using EM-seq outperformed bisulfite-converted libraries in all specific measures examined (coverage, duplication, sensitivity, etc.). EM-seq libraries displayed even GC distribution, better correlations across DNA inputs, increased numbers of CpGs within genomic features, and accuracy of cytosine methylation calls. EM-seq was effective using as little as 100 pg of DNA, and these libraries maintained the described advantages over bisulfite sequencing. EMseq library construction, using challenging samples and lower DNA inputs, opens new avenues for research and clinical applications.
Single molecule approaches offer the promise of large, exquisitely miniature ensembles for the generation of equally large data sets. Although microfluidic devices have previously been designed to manipulate single DNA molecules, many of the functionalities they embody are not applicable to very large DNA molecules, normally extracted from cells. Importantly, such microfluidic devices must work within an integrated system to enable high-throughput biological or biochemical analysis-a key measure of any device aimed at the chemical/biological interface and required if large data sets are to be created for subsequent analysis. The challenge here was to design an integrated microfluidic device to control the deposition or elongation of large DNA molecules (up to millimeters in length), which would serve as a general platform for biological/biochemical analysis to function within an integrated system that included massively parallel data collection and analysis. The approach we took was to use replica molding to construct silastic devices to consistently deposit oriented, elongated DNA molecules onto charged surfaces, creating massive single molecule arrays, which we analyzed for both physical and biochemical insights within an integrated environment that created large data sets. The overall efficacy of this approach was demonstrated by the restriction enzyme mapping and identification of single human genomic DNA molecules.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.