2015
DOI: 10.1101/028399
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks

Abstract: The complex language of eukaryotic gene expression remains incompletely understood. Thus, most of the many noncoding variants statistically associated with human disease have unknown mechanism. Here, we address this challenge using an approach based on a recent machine learning advance-deep convolutional neural networks (CNNs). We introduce an open source package Basset (https://github.com/ davek44/Basset) to apply deep CNNs to learn the functional activity of DNA sequences from genomics data. We trained Basse… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
276
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 163 publications
(278 citation statements)
references
References 58 publications
2
276
0
Order By: Relevance
“…In particular, we aimed to identify DNA sequences that could predict cell-type-specific effects of regulatory variants. We investigated the use of machine learning models to predict the chromatin activity of regulatory elements across our three cell types using DNA sequence only (Zhou and Troyanskaya 2015;Hashimoto et al 2016;Kelley et al 2016;Zeng et al 2016). We developed a four-layered neural network architecture, OrbWeaver, to predict cell-type-specific chromatin accessibility of 500-bp windows centered at a regulatory locus ( Fig.…”
Section: Sequence-based Model For Chromatin Activity Explains the Regmentioning
confidence: 99%
“…In particular, we aimed to identify DNA sequences that could predict cell-type-specific effects of regulatory variants. We investigated the use of machine learning models to predict the chromatin activity of regulatory elements across our three cell types using DNA sequence only (Zhou and Troyanskaya 2015;Hashimoto et al 2016;Kelley et al 2016;Zeng et al 2016). We developed a four-layered neural network architecture, OrbWeaver, to predict cell-type-specific chromatin accessibility of 500-bp windows centered at a regulatory locus ( Fig.…”
Section: Sequence-based Model For Chromatin Activity Explains the Regmentioning
confidence: 99%
“…Although SCM differs from existing methods aimed at binary classification of hypersensitive and nonhypersensitive chromatin, we asked how SCM performance compares to four sequence-based classifiers that use either k-mer based models (gkm-svm, SeqGL) or deep learning based models (deepSEA, Basset) (Ghandi et al 2014;Setty and Leslie 2015;Zhou and Troyanskaya 2015;Kelley et al 2016). Although SCM is designed for quantitation and not binary prediction, SCM performs as well as the four state-of-the-art binary predictive methods on black-box binary prediction of functional genomic regions (Supplemental Fig.…”
Section: Wwwgenomeorgmentioning
confidence: 99%
“…These methods include DeepSEA (Zhou and Troyanskaya, 2015), DeepBind (Alipanahi et al, 2015) and Basset (Kelley et al, 2016) that 'deep learn' regulatory sequence code from big genomics data; deltaSVM (Lee et al, 2015) and deSNPs (Huang and Ovcharenko, 2015;Li and Ovcharenko, 2015) that learn sequence features from a single enhancer-associated chromatin profile and consider the k-mer content associated with the genetic variant only; CATO (Maurano et al, 2015) that predicts chromatin states by using high-throughput sequencing data across multiple individuals; C-SCORE (Kircher et al, 2014) that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations; LINSIGHT (Huang et al, 2017) that predict the likelihood of deleterious fitness consequences of mutations at noncoding nucleotide sites by combining a generalized linear model for functional genomic data with a probabilistic model of molecular evolution; and CAPE (Li et al, 2016) that decomposes the sequence code of potential-binding sites and the binding sites of cofactors from a set of chromatin profiles, and directly quantifies the deactivating effect of a single nucleotide mutation based on the corresponding change in the underlying k-mer profile.…”
Section: Introductionmentioning
confidence: 99%