Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications.Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology.Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu.Contact: gifford@mit.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Relative to genomes of other sequenced organisms, the human genome appears particularly enriched for large, highly homologous segmental duplications (> or =90% sequence identity and > or =10 kbp in length). The molecular basis for this enrichment is unknown. We sought to gain insight into the mechanism of origin, by systematically examining sequence features at the junctions of duplications. We analyzed 9,464 junctions within regions of high-quality finished sequence from a genomewide set of 2,366 duplication alignments. We observed a highly significant (P<.0001) enrichment of Alu short interspersed element (SINE) sequences near or within the junction. Twenty-seven percent of all segmental duplications terminated within an Alu repeat. The Alu junction enrichment was most pronounced for interspersed segmental duplications separated by > or =1 Mb of intervening sequence. Alu elements at the junctions showed higher levels of divergence, consistent with Alu-Alu-mediated recombination events. When we classified Alu elements into major subfamilies, younger elements (AluY and AluS) accounted for the enrichment, whereas the oldest primate family (AluJ) showed no enrichment. We propose that the primate-specific burst of Alu retroposition activity (which occurred 35-40 million years ago) sensitized the ancestral human genome for Alu-Alu-mediated recombination events, which, in turn, initiated the expansion of gene-rich segmental duplications and their subsequent role in nonallelic homologous recombination.
Charge-neutral DNA nanoparticles have been developed in which single molecules of DNA are compacted to their minimal possible size. We speculated that the small size of these DNA nanoparticles may facilitate gene transfer in postmitotic cells, permitting nuclear uptake across the 25-nm nuclear membrane pore. To determine whether DNA nanoparticles can transfect nondividing cells, growth-arrested neuroblastoma and hepatoma cells were transfected with DNA/liposome mixtures encoding luciferase. In both models, growth-arrested cells were robustly transfected by compacted DNA (6,900 -360-fold more than naked DNA). To evaluate mechanisms responsible for enhanced transfection, HuH-7 cells were microinjected with naked or compacted plasmids encoding enhanced green fluorescent protein. Cytoplasmic microinjection of DNA nanoparticles generated a ϳ10-fold improvement in transgene expression as compared with naked DNA; this enhancement was reversed by the nuclear pore inhibitor, wheat germ agglutinin. To determine the upper size limit for gene transfer, DNA nanoparticles of various sizes were microinjected into the cytoplasm. A marked decrease in transgene expression was observed as the minor ellipsoidal diameter approached 25 nm. In summary, suitably sized DNA nanoparticles productively transfect growth arrested cells by traversing the nuclear membrane pore.Although nonviral gene transfer methods transfect dividing cells, these technologies fail to transfect most postmitotic cells (1-10), with the principal exceptions of naked DNA gene transfer into muscle (11) and large volume hydrodynamic gene transfer into liver (12, 13). In dividing cells, nuclear membrane disintegration during mitosis allows plasmid DNA to enter the nucleus prior to membrane reformation. Otherwise, the intact nuclear membrane restricts transfer of naked DNA into the nucleus. The nuclear membrane pore (NMP) 1 has an internal channel diameter of 25 nm (14, 15) and does not permit naked DNA to effectively cross into the nucleus, probably due to the extended size of hydrated DNA and its negative charge density (4,16,17). The NMP does permit passive transfer of gold particles less than 9 -10 nm in diameter and linear DNA fragments up to ϳ300 bp (18 -22) as well as facilitated transport of proteins and small DNA segments (up to ϳ1 kbp) having nuclear localization signals (7,(22)(23)(24)(25)(26)(27)(28). The relative inefficiency of naked DNA, liposome-DNA complexes, and protein-and polymer-based DNA conjugates to transfect nondividing cells productively remains a significant barrier for in vivo gene therapy. Electrostatic interactions between polycationic polymers and DNA can result in conjugates consisting of one or more molecules of DNA and a sufficient number of polycations to produce a nearly charge-neutral complex (29 -31). The ratio of positive to negative charges, buffer components, polycation counterion, DNA concentration, and pH, among other variables, influence the composition, size, and shape of these DNA conjugates (29,32). Based on specific fo...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.