2022
DOI: 10.1371/journal.pcbi.1010610
|View full text |Cite
|
Sign up to set email alerts
|

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets

Abstract: Proteins that are known only at a sequence level outnumber those with an experimental characterization by orders of magnitude. Classifying protein regions (domains) into homologous families can generate testable functional hypotheses for yet unannotated sequences. Existing domain family resources typically use at least some degree of manual curation: they grow slowly over time and leave a large fraction of the protein sequence space unclassified. We here describe automatic clustering by Density Peak Clustering… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
15
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(19 citation statements)
references
References 41 publications
4
15
0
Order By: Relevance
“…Results were in line with what we had observed when running DPCfam on UniRef50 [27]; indeed, the majority of the MCs (88.96%) did not fall into any of these categories. We noted that the fraction of disordered MCs was rather small (about 1%).…”
Section: Metaclusters General Propertiessupporting
confidence: 90%
See 4 more Smart Citations
“…Results were in line with what we had observed when running DPCfam on UniRef50 [27]; indeed, the majority of the MCs (88.96%) did not fall into any of these categories. We noted that the fraction of disordered MCs was rather small (about 1%).…”
Section: Metaclusters General Propertiessupporting
confidence: 90%
“…In this regime of 'the rich get richer', elements larger in size are usually the more interesting ones. Moreover, as noted in our previous work [27], MCs of smaller size are, on average, of lower quality. Because of this, we decide to focus our downstream analysis on the set of MCs containing at least 50 members.…”
Section: Metaclusters General Propertiessupporting
confidence: 70%
See 3 more Smart Citations