2009
DOI: 10.1093/bioinformatics/btp135
|View full text |Cite
|
Sign up to set email alerts
|

Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information

Abstract: After evaluating two current methods, we demonstrate how the use of sequence-weighting techniques to reduce sequence redundancy and low-count corrections to account for small number of observations in limited size sequence families, can significantly improve the predictability of MI. The evaluation is made on large sets of both in silico-generated alignments as well as on biological sequence data. The methods included in the analysis are the APC (average product correction) and RCW (row-column weighting) metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
119
0

Year Published

2009
2009
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 90 publications
(119 citation statements)
references
References 24 publications
0
119
0
Order By: Relevance
“…A statistically significant and diverse number of sequences (16272 sequences) in the Pfam database provided input for the MI computations. In MISTIC approach, sequence clustering is implemented to reduce sequence redundancy and sequence clusters are defined at a sequence identity threshold of 62% [148] . A lower bound of 400 sequences <62% identity is typically required in an MSA to yield statistically meaningful coevolutionary relationships.…”
Section: Methodsmentioning
confidence: 99%
“…A statistically significant and diverse number of sequences (16272 sequences) in the Pfam database provided input for the MI computations. In MISTIC approach, sequence clustering is implemented to reduce sequence redundancy and sequence clusters are defined at a sequence identity threshold of 62% [148] . A lower bound of 400 sequences <62% identity is typically required in an MSA to yield statistically meaningful coevolutionary relationships.…”
Section: Methodsmentioning
confidence: 99%
“…The MI score is calculated as a weighted sum of the log ratios between the observed and expected amino acid pair frequencies. The MI scores were translated into MI z -scores by comparing the MI values for each pair of positions with a distribution of prediction scores obtained from a large set of randomized MSAs88. The z -score is then calculated as the number of standard deviations that the observed MI value falls above the mean value obtained from the randomized MSAs.…”
Section: Calculationsmentioning
confidence: 99%
“…Many correction schemes for removing noise from the matrix of predicted contact scores have been examined 55 [6,22,34,37,43,56,59], and the average product correction (APC) [8] came out as a clear winner and is used in almost all recent studies. However, it is widely acknowledged in the field that our limited understanding of what noise effects APC is correcting and why it is so effectively correcting them is severely impeding progress in developing better statistical methods to predict contacting residue pairs.…”
Section: Introductionmentioning
confidence: 99%