Despite recent advances in our understanding of the importance of protein surface properties for protein thermostability,there are seldom studies on multi-factors rational design strategy, so a more scientific, simple and effective rational strategy is urgent for protein engineering. Here, we first attempted to use a three-factors rational design strategy combining three common structural features, protein flexibility, protein surface, and salt bridges. Escherichia coli AppA phytase was used as a model enzyme to improve its thermostability. Moreover, the structure and enzyme features of the thermostable mutants designed by our strategy were analyzed roundly. For the single mutants, two (Q206E and Y311K), in five exhibited thermostable property with a higher success rate of prediction (40 %). For the multiple mutants, the themostable sites were combined with another site, I427L, we obtained by directed evolution, Q206E/I427L, Y311K/I427L, and Q206E/Y311K/I427L, all exhibited thermostable property. The Y311K/I427L doubled thermostability (61.7 %, and was compared to 30.97 % after being heated at 80 °C for 10 min) and catalytic efficiency (4.46 was compared to 2.37) improved more than the wild-type AppA phytase almost without hampering catalytic activity. These multi-factors of rational design strategy can be applied practically as a thermostabilization strategy instead of the conventional single-factor approach.
Thermophilic proteins have great potential to be utilized as biocatalysts in biotechnology. Machine learning algorithms are gaining increasing use in identifying such enzymes, reducing or even eliminating the need for experimental studies. While most previously used machine learning methods were based on manually designed features, we developed BertThermo, a model using Bidirectional Encoder Representations from Transformers (BERT), as an automatic feature extraction tool. This method combines a variety of machine learning algorithms and feature engineering methods, while relying on single-feature encoding based on the protein sequence alone for model input. BertThermo achieved an accuracy of 96.97% and 97.51% in 5-fold cross-validation and in independent testing, respectively, identifying thermophilic proteins more reliably than any previously described predictive algorithm. Additionally, BertThermo was tested by a balanced dataset, an imbalanced dataset and a dataset with homology sequences, and the results show that BertThermo was with the best robustness as comparied with state-of-the-art methods. The source code of BertThermo is available.
Anticancer peptides (ACPs) represent a promising new therapeutic approach in cancer treatment. They can target cancer cells without affecting healthy tissues or altering normal physiological functions. Machine learning algorithms have increasingly been utilized for predicting peptide sequences with potential ACP effects. This study analyzed four benchmark datasets based on a well-established random forest (RF) algorithm. The peptide sequences were converted into 566 physicochemical features extracted from the amino acid index (AAindex) library, which were then subjected to feature selection using four methods: light gradient-boosting machine (LGBM), analysis of variance (ANOVA), chi-squared test (Chi2), and mutual information (MI). Presenting and merging the identified features using Venn diagrams, 19 key amino acid physicochemical properties were identified that can be used to predict the likelihood of a peptide sequence functioning as an ACP. The results were quantified by performance evaluation metrics to determine the accuracy of predictions. This study aims to enhance the efficiency of designing peptide sequences for cancer treatment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.