2019
DOI: 10.1093/nar/gkz654
|View full text |Cite
|
Sign up to set email alerts
|

A deep learning genome-mining strategy for biosynthetic gene cluster prediction

Abstract: Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep le… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
236
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 215 publications
(240 citation statements)
references
References 39 publications
(61 reference statements)
3
236
0
1
Order By: Relevance
“…Supervised learning was shown to perform well at BGC discovery in previous work that focused on handling bacteria data [5], [6]. Given that annotated data are needed to perform a supervised learning approach, we propose here fungal BGC datasets to support the development of this approach for fungi.…”
Section: A Proposed Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…Supervised learning was shown to perform well at BGC discovery in previous work that focused on handling bacteria data [5], [6]. Given that annotated data are needed to perform a supervised learning approach, we propose here fungal BGC datasets to support the development of this approach for fungi.…”
Section: A Proposed Datasetsmentioning
confidence: 99%
“…To generate classification models based on a supervised learning method, we extracted Pfam [22] 6 IDs from the positive and negative instances. All datasets were converted into pfamtsv format [6], which is required as input in the supervised learning approach applied in this work. For each dataset, 80% were randomly selected for the training phase, while 20% were held out for the validation phase, as shown in Table I.…”
Section: A Proposed Datasetsmentioning
confidence: 99%
See 3 more Smart Citations