Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Recent studies have shown that the error patterns of NHEJ are strongly biased by sequence context, but these studies were based on relatively few templates. To investigate this more thoroughly, we systematically profiled ∼1.16 million independent mutational events resulting from CRISPR/Cas9-mediated cleavage and NHEJ-mediated DSB repair of 6872 synthetic target sequences, introduced into a human cell line via lentiviral infection. We find that: (i) insertions are dominated by 1 bp events templated by sequence immediately upstream of the cleavage site, (ii) deletions are predominantly associated with microhomology and (iii) targets exhibit variable but reproducible diversity with respect to the number and relative frequency of the mutational outcomes to which they give rise. From these data, we trained a model that uses local sequence context to predict the distribution of mutational outcomes. Exploiting the bias of NHEJ outcomes towards microhomology mediated events, we demonstrate the programming of deletion patterns by introducing microhomology to specific locations in the vicinity of the DSB site. We anticipate that our results will inform investigations of DSB repair mechanisms as well as the design of CRISPR/Cas9 experiments for diverse applications including genome-wide screens, gene therapy, lineage tracing and molecular recording.
Cytosine, 5-methylcytosine, and 5-hydroxymethylcytosine were identified during translocation of single DNA template strands through a modified Mycobacterium smegmatis porin A (M2MspA) nanopore under control of phi29 DNA polymerase. This identification was based on three consecutive ionic current states that correspond to passage of modified or unmodified CG dinucleotides and their immediate neighbors through the nanopore limiting aperture. To establish quality scores for these calls, we examined ∼3,300 translocation events for 48 distinct DNA constructs. Each experiment analyzed a mixture of cytosine-, 5-methylcytosine-, and 5-hydroxymethylcytosine-bearing DNA strands that contained a marker that independently established the correct cytosine methylation status at the target CG of each molecule tested. To calculate error rates for these calls, we established decision boundaries using a variety of machine-learning methods. These error rates depended upon the identity of the bases immediately 5′ and 3′ of the targeted CG dinucleotide, and ranged from 1.7% to 12.2% for a single-pass read. We estimate that Q40 values (0.01% error rates) for methylation status calls could be achieved by reading single molecules 5-19 times depending upon sequence context.MspA | epigenetics E pigenetic modifications of DNA help regulate gene transcription in biological cells. In mammals, 5-methylcytosine (mC) modification of CG dinucleotides is known to influence development (1, 2) and contribute to human diseases including cancer (3). Other modifications have been detected at carbon 5 of cytosine including 5-hydroxymethylcytosine (hmC) (4), and more recently 5-formylcytosine, and 5-carboxycytosine (5). Physiological roles for hmC in carcinogenesis and embryonic stem cell differentiation have been proposed (6).High-throughput techniques for mC detection are based on bisulfite treatment of genomic DNA (7). In the conventional assay, cytosine (but not mC nor hmC) is converted to uracil (8). Thus, positions not converted to uracil identify cytosines that were modified in the original genomic sequence. In a landmark paper, Lister et al. (9) used this technique to map genome-wide cytosine methylation in human embryonic stem cells and fetal lung fibroblasts at single-nucleotide precision. Recently, bisulfite strategies for discriminating between mC and hmC using the Tet1 enzyme (10) or by chemical modification of hmC (11) have been described.Single-molecule techniques have emerged as possible alternatives to bisulfite treatment for detecting epigenetic modifications of DNA (12). These single-molecule approaches share several useful features including few processing steps before sequence analysis, long reads that routinely exceed several thousand nucleotides, and the ability to read native DNA strands in heterogeneous mixtures. The most advanced of these single-molecule techniques, from Pacific Biosciences, uses fluorescence to detect labeled nucleotide triphosphates during daughter-strand elongation. This elongation is catalyzed by a DNA pol...
Highlights d A deep learning model for AD prediction was derived from a large set of synthetic ADs d The predictor (ADpred) identifies sequence features important for acidic AD function d AD sequence features explain the basis for the fuzzy binding mechanism of acidic ADs d Acidic ADs are enriched in yeast but not in Drosophila or human transcription factors
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.