MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier

Zhang, Hong-Qi; Liu, Shang-Hua; Li, Rui; Yu, Jun-Wen; Ye, Dong-Xin; Yuan, Shi-Shi; Lin, Hao; Huang, Cheng-Bing; Tang, Hua

doi:10.1021/acsomega.3c09587

ACS Omega

2024

DOI: 10.1021/acsomega.3c09587

|View full text |Cite

MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier

Hong-Qi Zhang,

Shang-Hua Liu,

Rui Li

et al.

Abstract: In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to metal ions and the type of metal ion-binding protein, this study proposed a classifier named MIBPred. The classifier incorporated advanced Word2Vec technology from the field of natural language processing to extract semantic features of the protein sequence language and combined them with positionspecific score matrix… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 70 publications

(89 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

Zuo,

Zhang,

Dong

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Glycation, a type of posttranslational modification, preferentially occurs on lysine and arginine residues, impairing protein functionality and altering characteristics. This process is linked to diseases such as Alzheimer’s, diabetes, and atherosclerosis. Traditional wet lab experiments are time-consuming, whereas machine learning has significantly streamlined the prediction of protein glycation sites. Despite promising results, challenges remain, including data imbalance, feature redundancy, and suboptimal classifier performance. This research introduces Glypred, a lysine glycation site prediction model combining ClusterCentroids Undersampling (CCU), LightGBM, and bidirectional long short-term memory network (BiLSTM) methodologies, with an additional multihead attention mechanism integrated into the BiLSTM. To achieve this, the study undertakes several key steps: selecting diverse feature types to capture comprehensive protein information, employing a cluster-based undersampling strategy to balance the data set, using LightGBM for feature selection to enhance model performance, and implementing a bidirectional LSTM network for accurate classification. Together, these approaches ensure that Glypred effectively identifies glycation sites with high accuracy and robustness. For feature encoding, five distinct feature typesAAC, KMER, DR, PWAA, and EBGWwere selected to capture a broad spectrum of protein sequence and biological information. These encoded features were integrated and validated to ensure comprehensive protein information acquisition. To address the issue of highly imbalanced positive and negative samples, various undersampling algorithms, including random undersampling, NearMiss, edited nearest neighbor rule, and CCU, were evaluated. CCU was ultimately chosen to remove redundant nonglycated training data, establishing a balanced data set that enhances the model’s accuracy and robustness. For feature selection, the LightGBM ensemble learning algorithm was employed to reduce feature dimensionality by identifying the most significant features. This approach accelerates model training, enhances generalization capabilities, and ensures good transferability of the model. Finally, a bidirectional long short-term memory network was used as the classifier, with a network structure designed to capture glycation modification site features from both forward and backward directions. To prevent overfitting, appropriate regularization parameters and dropout rates were introduced, achieving efficient classification. Experimental results show that Glypred achieved optimal performance. This model provides new insights for bioinformatics and encourages the application of similar strategies in other fields. A lysine glycation site prediction software tool was also developed using the PyQt5 library, offering researchers an auxiliary screening tool to reduce workload and improve efficiency. The software and data sets are available on GitHub: .

show abstract

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

Zuo,

Zhang,

Dong

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

A Soft Voting Ensemble Model for Hotel Revenue Prediction

Jiang,

Ni,

Chen

2024

IJEFM

View full text Add to dashboard Cite

In recent years, the hotel industry has faced unprecedented opportunities and challenges due to the increasing demand for travel and business trips. This growth not only presents significant opportunities but also brings challenges to resource management and price setting. Accurate hotel revenue prediction is crucial for the hotel industry as it influences pricing strategies and resource allocation. However, traditional hotel revenue prediction models fail to capture the diversity and complexity of hotel revenue data, resulting in inefficient and inaccurate predictions. Then, with the development of the ensemble learning, its application to hotel revenue prediction has emerged as an influential research direction. This study proposes a soft voting ensemble model for hotel revenue prediction, which includes six base models: Convolutional Neural Network, K-nearest Neighbors, Linear Regression, Long Short-term Memory, Multi-layer Perceptron, and Recurrent Neural Network. Firstly, the hyper-parameters of the base models are optimized with Bayesian optimization. Subsequently, a soft voting ensemble method is used to aggregate the predictions of each base model. Finally, experimental results on the hotel revenue dataset demonstrate that the soft voting ensemble model outperforms base models across six key performance metrics, providing hotel managers with more accurate revenue prediction tools to aid in scientific management decisions and resource allocation strategies. This study confirms the effectiveness of the soft voting ensemble model in enhancing the accuracy of hotel revenue forecasts, demonstrating its significant potential for application in strategic planning within the modern hotel industry.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier

Cited by 2 publications

References 70 publications

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

A Soft Voting Ensemble Model for Hotel Revenue Prediction

Contact Info

Product

Resources

About