Cohesin is present in almost all active enhancer regions, where it is associated with transcription factors. Cohesin frequently colocalizes with CTCF (CCCTC-binding factor), affecting genomic stability, expression and epigenetic homeostasis. Cohesin subunits are mutated in cancer, but CTCF/cohesin-binding sites (CBSs) in DNA have not been examined for mutations. Here we report frequent mutations at CBSs in cancers displaying a mutational signature where mutations in A•T base pairs predominate. Integration of whole-genome sequencing data from 213 colorectal cancer (CRC) samples and chromatin immunoprecipitation sequencing (ChIP-exo) data identified frequent point mutations at CBSs. In contrast, CRCs showing an ultramutator phenotype caused by defects in the exonuclease domain of DNA polymerase ɛ (POLE) displayed significantly fewer mutations at and adjacent to CBSs. Analysis of public data showed that multiple cancer types accumulate CBS mutations. CBSs are a major mutational hotspot in the noncoding cancer genome.
Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393 Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n ¼ 31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n ¼ 31 for at least 140 My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.
A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible sequence will be incompressible under this model. We develop new static and dynamic full-text indexes that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations. The new indexes can be plugged into a recent dynamic fully-compressed suffix tree, achieving full functionality for sequence analysis, while retaining the reduced space and the polylogarithmic slowdown. Our experimental results confirm the practicality of our proposal.
Streptococcus suis causes disease in pigs worldwide and is increasingly implicated in zoonotic disease in East and South-East Asia. To understand the genetic basis of disease in S. suis, we study the genomes of 375 isolates with detailed clinical phenotypes from pigs and humans from the United Kingdom and Vietnam. Here, we show that isolates associated with disease contain substantially fewer genes than non-clinical isolates, but are more likely to encode virulence factors. Human disease isolates are limited to a single-virulent population, originating in the 1920, s when pig production was intensified, but no consistent genomic differences between pig and human isolates are observed. There is little geographical clustering of different S. suis subpopulations, and the bacterium undergoes high rates of recombination, implying that an increase in virulence anywhere in the world could have a global impact over a short timescale.
We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications of such extensions. We extend the Burrows-Wheeler transform (BWT) of strings to acyclic directed labeled graphs, to support path queries as an extension to substring searching. We develop, apply, and tailor this technique to a) read alignment on an extended BWT index of a graph representing pan-genome, i.e., reference genome and known variants of it; and b) split-read alignment on an extended BWT index of a splicing graph. Other possible applications include probe/primer design, alignments to assembly graphs, and alignments to phylogenetic tree of partial-order graphs. We report several experiments on the feasibility and applicability of the approach. Especially on highly-polymorphic genome regions our pan-genome index is making a significant improvement in alignment accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.