We identified the sequence-specific starting positions of consecutive miscalls in the mapping of reads obtained from the Illumina Genome Analyser (GA). Detailed analysis of the miscall pattern indicated that the underlying mechanism involves sequence-specific interference of the base elongation process during sequencing. The two major sequence patterns that trigger this sequence-specific error (SSE) are: (i) inverted repeats and (ii) GGC sequences. We speculate that these sequences favor dephasing by inhibiting single-base elongation, by: (i) folding single-stranded DNA and (ii) altering enzyme preference. This phenomenon is a major cause of sequence coverage variability and of the unfavorable bias observed for population-targeted methods such as RNA-seq and ChIP-seq. Moreover, SSE is a potential cause of false single-nucleotide polymorphism (SNP) calls and also significantly hinders de novo assembly. This article highlights the importance of recognizing SSE and its underlying mechanisms in the hope of enhancing the potential usefulness of the Illumina sequencers.
DNA produces a wide range of structures in addition to the canonical B-form of double-stranded DNA. Some of these structures are stabilized by Hoogsteen bonds. We developed an experimentally parameterized, coarse-grained model that incorporates such bonds. The model reproduces many of the microscopic features of double-stranded DNA and captures the experimental melting curves for a number of short DNA hairpins, even when the open state forms complicated secondary structures. We demonstrate the utility of the model by simulating the folding of a thrombin aptamer, which contains G-quartets, and strand invasion during triplex formation. Our results highlight the importance of including Hoogsteen bonding in coarse-grained models of DNA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.