2016
DOI: 10.1186/s12859-016-1060-3
|View full text |Cite
|
Sign up to set email alerts
|

Addressing inaccuracies in BLOSUM computation improves homology search performance

Abstract: BackgroundBLOSUM matrices belong to the most commonly used substitution matrix series for protein homology search and sequence alignments since their publication in 1992. In 2008, Styczynski et al. discovered miscalculations in the clustering step of the matrix computation. Still, the RBLOSUM64 matrix based on the corrected BLOSUM code was reported to perform worse at a statistically significant level than the BLOSUM62.Here, we present a further correction of the (R)BLOSUM code and provide a thorough performan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
14
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 24 publications
(78 reference statements)
2
14
0
Order By: Relevance
“…For a list of homologous search results ordered by their E -values, this measure represents the fraction of the correctly found, true positive superfamily relations which remain after cutting the list in order to restrict the number of false positives to a certain amount. We set this threshold to 0.01 errors per query (epq) in concordance to other studies [25, 35]. This effectively restricts the number of false positives found within 100 queries to a single false positive.…”
Section: Methodsmentioning
confidence: 99%
“…For a list of homologous search results ordered by their E -values, this measure represents the fraction of the correctly found, true positive superfamily relations which remain after cutting the list in order to restrict the number of false positives to a certain amount. We set this threshold to 0.01 errors per query (epq) in concordance to other studies [25, 35]. This effectively restricts the number of false positives found within 100 queries to a single false positive.…”
Section: Methodsmentioning
confidence: 99%
“…Similarly RBLOSUM matrices were recomputed using the algorithm developed by Styczynski et al [ 4 ] and the respective programs were obtained from http://web.mit.edu/bamel/blosum/revised_blosum.c . CorBLOSUM [ 6 ] matrices were directly obtained from http://www.cbs.tu-darmstadt.de/CorBLOSUM/ . BLOSUM50 and BLOSUM62, the most widely used BLOSUM matrices, were considered in the present study.…”
Section: Main Textmentioning
confidence: 99%
“…These studies are seen carried out using the initial BLOCKS release version5.0. Hess et al have recently proposed CorBLOSUM [ 6 ] by addressing an inaccuracy in the BLOSUM code. They relied on computing the matrices by changing the datatype threshold from integer to float.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Matrices such the PMBEC [27], have been generated based on the two models that produce a minor increase in performance but ultimately are vulnerable to the same factors as their predecessors [11]. In 2008, for example, a miscalculation was discovered in the clustering protocol of the BLOSUM matrix [31]. Despite extensive characterization of the mistake, BLOSUM is still the standard for one of the largest alignment-capable databases available to date, BLAST [11].…”
Section: Introductionmentioning
confidence: 99%