2019
DOI: 10.1186/s12859-019-2973-4
|View full text |Cite
|
Sign up to set email alerts
|

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Abstract: Background Clustering methods are essential to partitioning biological samples being useful to minimize the information complexity in large datasets. Tools in this context usually generates data with greed algorithms that solves some Data Mining difficulties which can degrade biological relevant information during the clustering process. The lack of standardization of metrics and consistent bases also raises questions about the clustering efficiency of some methods. Benchmarks are needed to explor… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…The complete set of proteins translated from the CDS of each of the 67 genomes were joined in a FASTA file and clustered using the RAFTS3G tool [ 44 ] (available at: ) applying a minimal self-score of 0.7 as a threshold parameter for a specific protein to be included in a cluster. Two groupings were performed separately, RAFTS3G-32 for the Azoarcus-Aromatoleum group and RAFTS3g-67 for the 67 genomes.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The complete set of proteins translated from the CDS of each of the 67 genomes were joined in a FASTA file and clustered using the RAFTS3G tool [ 44 ] (available at: ) applying a minimal self-score of 0.7 as a threshold parameter for a specific protein to be included in a cluster. Two groupings were performed separately, RAFTS3G-32 for the Azoarcus-Aromatoleum group and RAFTS3g-67 for the 67 genomes.…”
Section: Methodsmentioning
confidence: 99%
“…For nitrate reduction pathway analysis, we searched the RAFTS3G-32 clusters to identify the presence of genes encoding nitrate reductase ( nar / nap ), nitrite reductase ( nir ), nitric oxide reductase ( nor ), and nitrous oxide reductase ( nos ) using Aromatoleum sp. CIB corresponding marker genes (RAFTS3, [ 44 , 46 ]) and curated them manually. Aromatic compound degradation pathway genes including those for the anaerobic benzoate degradation pathway gene cluster ( bzd ), aerobic benzoate degradation pathway gene cluster ( box ), and genes from the “lower pathway” ( LP ) as marker genes were similarly searched in the RAFTS3G-32 groups.…”
Section: Methodsmentioning
confidence: 99%
“…Extracting and analyzing whole genomes from miscellaneous species and mapping genes or enzymatic complexes are great challenges. It is crucial to integrate techniques to reduce informational complexity, without loss of biological meaning in a large volume of biological data [20]. In this scenario, Arti cial Intelligence (AI) for bioinformatics studies becomes an important strategy of computational and statistical methods for the manipulation and extraction of knowledge from the BNF.…”
Section: Introductionmentioning
confidence: 99%
“…By overcoming some of the major disadvantages of alignments, such as strong evolutionary assumptions [28], high computational costs [29] as well as nonnumerical sequence representation, alignment-free methods evolved as a true alternative for quantifying sequence (dis-)similarity [30]. At present, respective methods are used in the domains of phylogenetics [31][32][33], (meta-)genomics [34,35], database similarity search [36], or next-generation sequencing data analyses [37][38][39].…”
Section: Introductionmentioning
confidence: 99%