Background: The importance of protein secondary structure (SS) prediction is widely known, its solution enables learning about the role of a protein in organisms. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman-Pearson approach is not appropriate. Also, the state-of-the-art predictors have usually relatively long prediction times.Results: We present a new deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture. We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher-Pitman permutation tests accompanied by practical significance measured by Cohen’s effect size. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with two state-of-the-art methods SAINT and SPOT-1D on benchmark datasets TEST2016, TEST2018, and CASP12. Conclusions: Our results suggest that ProteinUnet2 has much shorter prediction times while maintaining (or outperforming) the mentioned predictors. We strongly believe that our proposed statistical methodology will be adopted and used (and even expanded) by the research community.
The main problem discussed in this paper concerns the importance of the presence of a hydrophobic core in β-sandwich supersecondary structures. The aim of this research is to propose an alternative structural classification of the relationship between sequence and spatial structure. The set of analyzed proteins contains very diverse examples (taking into consideration source organisms, chain length, domain composition, ligand and metal complexation, quaternary structure), allowing for generalization of conclusions. The biological function of the proteins in question is also fundamentally different. The only common feature of these proteins is the presence of a β-sandwich or β-sandwich-like domain. The data base is taken from alternative classification of secondary and supersecordary of sandwich-like domains. The results show that the secondary and supersecondary
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.