Knot or not? <scp>Identifying</scp> unknotted proteins in knotted families with sequence‐based <scp>Machine Learning</scp> model

Sikora, Maciej; Klimentova, Eva; Uchal, Dawid; Sramkova, Denisa; Perlinska, Agata P.; Nguyen, Mai Lan; Korpacz, Marta; Malinowska, Roksana; Nowakowski, Szymon; Rubach, Pawel; Simecek, Petr; Sulkowska, Joanna I.

doi:10.1002/pro.4998

Protein Science

2024

DOI: 10.1002/pro.4998

|View full text |Cite

Knot or not? Identifying unknotted proteins in knotted families with sequence‐based Machine Learning model

Maciej Sikora,

Eva Klimentova,

Dawid Uchal

et al.

Abstract: Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein stru… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Sequence‐Similar Protein Domain Pairs With Structural or Topological Dissimilarity

Røgen

2024

Proteins

View full text Add to dashboard Cite

For a variety of applications, protein structures are clustered by sequence similarity, and sequence‐redundant structures are disregarded. Sequence‐similar chains are likely to have similar structures, but significant structural variation, as measured with RMSD, has been documented for sequence‐similar chains and found usually to have a functional explanation. Moving two neighboring stretches of backbone through each other may change the chain topology and alter possible folding paths. The size of this motion is compatible to a variation in a flexible loop. We search and find domains with alternate chain topology in CATH4.2 sequence families relatively independent of sequence identity and of structural similarity as measured by RMSD. Structural, topological, and functional representative sets should therefore keep sequence‐similar domains not just with structural variation but also with topological variation. We present BCAlign that finds Alignment and superposition of protein Backbone Curves by optimizing a user chosen convex combination of structural derivation and derivation between the structure‐based sequence alignment and an input sequence alignment. Steric and topological obstructions from deforming a curve into an aligned curve are then found by a previously developed algorithm. For highly sequence‐similar domains, sequence‐based structural alignment better represents the chains motion and generally reveals larger structural and topological variation than structure‐based does. Fold‐switching protein pairs have been reported to be most frequent between X‐ray and NMR structures and estimated to be underrepresented in the PDB as the alternate configuration is harder to resolve. Here we similarly find chain topology most frequently altered between X‐ray and NMR structures.

show abstract

Sequence‐Similar Protein Domain Pairs With Structural or Topological Dissimilarity

Røgen

2024

Proteins

View full text Add to dashboard Cite

show abstract

Everything AlphaFold tells us about protein knots

Perlinska,

Sikora,

Sulkowska

2024

Journal of Molecular Biology

View full text Add to dashboard Cite

Knots and θ-Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks

Bruno da Silva,

Gabrovšek,

Korpacz

et al. 2024

Macromolecules

View full text Add to dashboard Cite

Entanglement in proteins is a fascinating structural motif that is neither easy to detect via traditional methods nor fully understood. Recent advancements in AI-driven models have predicted that millions of proteins could potentially have a nontrivial topology. Herein, we have shown that long short-term memory (LSTM)-based neural networks (NN) architecture can be applied to detect, classify, and predict entanglement not only in closed polymeric chains but also in polymers and protein-like structures with open knots, actual protein configurations, and also θ-curves motifs. The analysis revealed that the LSTM model can predict classes (up to the 61 knot) accurately for closed knots and open polymeric chains, resembling real proteins. In the case of open knots formed by protein-like structures, the model displays robust prediction capabilities with an accuracy of 99%. Moreover, the LSTM model with proper features, tested on hundreds of thousands of knotted and unknotted protein structures with different architectures predicted by AlphaFold 2, can distinguish between the trivial and nontrivial topology of the native state of the protein with an accuracy of 93%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Knot or not? Identifying unknotted proteins in knotted families with sequence‐based Machine Learning model

Cited by 3 publications

References 63 publications

Sequence‐Similar Protein Domain Pairs With Structural or Topological Dissimilarity

Sequence‐Similar Protein Domain Pairs With Structural or Topological Dissimilarity

Everything AlphaFold tells us about protein knots

Knots and θ-Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks

Contact Info

Product

Resources

About