2016
DOI: 10.1101/gr.200535.115
|View full text |Cite
|
Sign up to set email alerts
|

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks

Abstract: The complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many noncoding variants statistically associated with human disease, nearly all such variants have unknown mechanisms. Here, we address this challenge using an approach based on a recent machine learning advance—deep convolutional neural networks (CNNs). We introduce the open source package Basset to apply CNNs to learn the functional activity of DNA sequences from genomics data. We trained B… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
757
1
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 887 publications
(800 citation statements)
references
References 56 publications
4
757
1
2
Order By: Relevance
“…Orthogonal approaches to predict effects of non-coding variants using deep convolutional neural networks have been proposed 166 . Multiple methods to characterize lincRNA interactomes have been described; several are highlighted here.…”
Section: Figurementioning
confidence: 99%
“…Orthogonal approaches to predict effects of non-coding variants using deep convolutional neural networks have been proposed 166 . Multiple methods to characterize lincRNA interactomes have been described; several are highlighted here.…”
Section: Figurementioning
confidence: 99%
“…SCM’s regression approach bypasses the need to call peaks and models synergistic relationships among nearby k -mers to directly predict the accessibility of a region. More recently, several deep learning methods [74, 75] aim to predict chromatin features including accessibility [7678] (Table S1). A recent approach is Basset [78], which uses convolutional neural networks to learn context-specific sequence predictors of DNA accessibility.…”
Section: Identification Of Regulatory Sequence Elements and Their Genmentioning
confidence: 99%
“…The development of statistical and machine-learning methods that attempt to address this integrative prediction challenge has emerged as an active, fast-moving area of research. Recently published methods in this area can be roughly divided intothree categories: (1) machine-learning classifiers that attempt to separate known disease variants from putatively benign variants using a variety of genomic features (e.g., GWAVA 13 and FATHMM-MKL 14 ); (2) sequence- and motif-based predictors for the impact of noncoding variants on cell-type-specific molecular phenotypes, such as chromatin accessibility or histone modifications (e.g., DeepBind 15 , DeepSEA 16 and Basset 17 ); and (3) evolutionary methods that consider data on genetic variation together with functional genomic data and aim to predict the effects of noncoding variants on fitness (e.g., CADD 18 , DANN 19 , FunSeq2 20 , and fitCons 3 ). A limitation of methods of the first class is that they depend strongly on the available training data, which may be limited and may not be representative of the broader class of regulatory sequences of interest.…”
Section: Introductionmentioning
confidence: 99%