Background:
Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.
Methods:
In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.
Conclusion:
Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.