Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
The AI revolution sparked by natural language and image processing has brought new ideas and research paradigms to the field of protein computing. One significant advancement is the development of pre-trained protein language models through self-supervised learning from massive protein sequences. These pre-trained models encode various information about protein sequences, evolution, structures, and even functions, which can be easily transferred to various downstream tasks and demonstrate robust generalization capabilities. Recently, researchers are further developing multimodal pre-trained models that integrate more diverse types of data. This review summarizes the recent studies in this direction from the following aspects. Firstly, it reviews protein pre-trained models that integrate protein structures into language models; this is of particular importance since protein structure is the primary determinant of its function. Secondly, the pre-trained models that integrate protein dynamic information are introduced. These models may benefit downstream tasks such as protein-protein interactions, soft docking of ligands, and interactions involving allosteric proteins and intrinsic disordered proteins. Thirdly, the pre-trained models that integrate knowledge such as gene ontology are described. Fourthly, we briefly introduce pretrained models in RNA fields. Lastly, we introduce the most recent developments in protein designs and discuss the relations of these models with respect to the aforementioned pre-trained models that integrate protein structure information.
The AI revolution sparked by natural language and image processing has brought new ideas and research paradigms to the field of protein computing. One significant advancement is the development of pre-trained protein language models through self-supervised learning from massive protein sequences. These pre-trained models encode various information about protein sequences, evolution, structures, and even functions, which can be easily transferred to various downstream tasks and demonstrate robust generalization capabilities. Recently, researchers are further developing multimodal pre-trained models that integrate more diverse types of data. This review summarizes the recent studies in this direction from the following aspects. Firstly, it reviews protein pre-trained models that integrate protein structures into language models; this is of particular importance since protein structure is the primary determinant of its function. Secondly, the pre-trained models that integrate protein dynamic information are introduced. These models may benefit downstream tasks such as protein-protein interactions, soft docking of ligands, and interactions involving allosteric proteins and intrinsic disordered proteins. Thirdly, the pre-trained models that integrate knowledge such as gene ontology are described. Fourthly, we briefly introduce pretrained models in RNA fields. Lastly, we introduce the most recent developments in protein designs and discuss the relations of these models with respect to the aforementioned pre-trained models that integrate protein structure information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.