An essential step in engineering proteins and understanding
disease-causing
missense mutations is to accurately model protein stability changes
when such mutations occur. Here, we developed a new sequence-based
predictor for the protein stability (PROST) change (Gibb’s free energy change, ΔΔG) upon a single-point missense mutation. PROST extracts
multiple descriptors from the most promising sequence-based predictors,
such as BoostDDG, SAAFEC-SEQ, and DDGun. RPOST also extracts descriptors
from iFeature and AlphaFold2. The extracted descriptors include sequence-based
features, physicochemical properties, evolutionary information, evolutionary-based
physicochemical properties, and predicted structural features. The
PROST predictor is a weighted average ensemble model based on extreme
gradient boosting (XGBoost) decision trees and an extra-trees regressor;
PROST is trained on both direct and hypothetical reverse mutations
using the S5294 (S2647 direct mutations + S2647 inverse mutations).
The parameters for the PROST model are optimized using grid searching
with 5-fold cross-validation, and feature importance analysis unveils
the most relevant features. The performance of PROST is evaluated
in a blinded manner, employing nine distinct data sets and existing
state-of-the-art sequence-based and structure-based predictors. This
method consistently performs well on frataxin, S217, S349, Ssym, S669,
Myoglobin, and CAGI5 data sets in blind tests and similarly to the
state-of-the-art predictors for p53 and S276 data sets. When the performance
of PROST is compared with the latest predictors such as BoostDDG,
SAAFEC-SEQ, ACDC-NN-seq, and DDGun, PROST dominates these predictors.
A case study of mutation scanning of the frataxin protein for nine
wild-type residues demonstrates the utility of PROST. Taken together,
these findings indicate that PROST is a well-suited predictor when
no protein structural information is available. The source code of
PROST, data sets, examples, and pretrained models along with how to
use PROST are available at and .