Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β -β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we introduce a novel ridge-detection-based β -β contact predictor, RDb 2 C, to identify residue pairing in β strands from any predicted residue contact map. The algorithm adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb 2 C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~62% and ~76%at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb 2 C achieves impressively higher performance, with F1-scores reaching ~76% and ~86% at the residue level and strand level, respectively. According to our tests on 61 mainly β proteins, improvement in the β -β contact prediction can further ameliorate the structural prediction.
Availability:All source data and codes are available at http://166.111.152.91/Downloads.html or at the GitHub address of https://github.com/wzmao/RDb2C.
Author summaryDue to the topological complexity, mainly β proteins are challenging targets in protein structure prediction. Knowledge of the pairing between β strands, especially the residue pairing pattern, can greatly facilitate the tertiary structure prediction of mainly β proteins. In this work, we developed a novel algorithm to identify the residue pairing in β strands from a predicted residue contact map.This method adopts the ridge detection technique to capture the characteristic pattern of β -β interactions from the map and then utilizes a multi-stage random forest framework to predict β -β contacts at the residue level. According to our tests, our method could effectively improve the prediction of β -β contacts even from a highly noisy contact map. Moreover, the refined β -β contact information could effectively improve the structural modeling of mainly β proteins.