Julia Abel scite author profile

In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning classifier models based on generalized learning vector quantization, a powerful methodology for sequence classification is achieved which allows substantial knowledge extraction in addition to the high classification ability due to the model-inherent robustness. Any potential (slightly) inferior performance of the used classifier is compensated by the additional knowledge provided by interpretable models. This knowledge may assist the user in the analysis and understanding of the used data and considered task. After theoretical justification of the concepts, we demonstrate the approach for various example data sets covering different areas in biomolecular sequence analysis.

show abstract

Alignment-Free Sequence Comparison: A Systematic Survey From a Machine Learning Perspective

Bohnsack¹,

Kaden²,

Abel³

et al. 2023

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

Detection of native and mirror protein structures based on Ramachandran plot analysis by interpretable machine learning models

Villmann

Abel

Bohnsack

et al. 2020

Preprint

View full text Add to dashboard Cite

In this contribution the discrimination between native and mirror models of proteins according to their chirality is tackled based on the structural protein information. This information is contained in the Ramachandran plots of the protein models. We provide an approach to classify those plots by means of an interpretable machine learning classifier - the Generalized Matrix Learning Vector Quantizer. Applying this tool, we are able to distinguish with high accuracy between mirror and native structures just evaluating the Ramachandran plots. The classifier model provides additional information regarding the importance of regions, e.g. α-helices and β-strands, to discriminate the structures precisely. This importance weighting differs for several considered protein classes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Julia Abel

The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers

Alignment-Free Sequence Comparison: A Systematic Survey From a Machine Learning Perspective

Detection of native and mirror protein structures based on Ramachandran plot analysis by interpretable machine learning models

Contact Info

Product

Resources

About