We present GuideScan software for the design of CRISPR guide RNA libraries that can be used to edit coding and noncoding genomic regions. GuideScan produces high-density sets of gRNAs for single- and paired-gRNA genome-wide screens. We also show that by using a trie data structure GuideScan designs gRNAs that are more specific than those designed by existing tools.
Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts—for example, distance-dependent random polymer ligation and GC content and mappability bias—and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant interactions at the sub-topologically associating domain level, identifying potential structural and regulatory interactions supported by CTCF binding sites, DNase accessibility, and/or active histone marks. CTCF-associated interactions are most strongly enriched in the middle genomic distance range (∼700 kb–1.5 Mb), while interactions involving actively marked DNase accessible elements are enriched both at short (<500 kb) and longer (>1.5 Mb) genomic distances. There is a striking enrichment of longer-range interactions connecting replication-dependent histone genes on chromosome 6, potentially representing the chromatin architecture at the histone locus body.
Decoding transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF class/family labels into the same space. By training on binding data for hundreds of TFs and embedding over 1M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance,
in vitro
and
in vivo
, and can distinguish signals of closely related TFs.
Summary
A significant challenge of functional genomics is to develop methods for genome-scale acquisition and analysis of cell biological data. Here, we present an integrated method that combines genome-wide genetic perturbation of Saccharomyces cerevisiae with high-content screening to facilitate the genetic description of sub-cellular structures and compartment morphology. As proof-of-principle, we used a Rad52-GFP marker to examine DNA damage foci in ~20 million single cells from ~5000 different mutant backgrounds in the context of selected genetic or chemical perturbations. Phenotypes were classified using a machine learning-based automated image analysis pipeline. 345 mutants were identified that had elevated numbers of DNA damage foci, almost half of which were identified only in sensitized backgrounds. Subsequent analysis of Vid22, a protein implicated in the DNA damage response, revealed that it acts together with the Sgs1 helicase at sites of DNA damage, and preferentially binds G-quadruplex regions of the genome. This approach is extensible to numerous other cell biological markers and experimental systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.