Antibody represents a specific class of proteins produced by the adaptive immunity as a response to invading pathogens, and mining the information implied in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. Protein-specific pre-training models have been used to extract latent representations from protein sequences containing structural, functional, and homologous information. However, there is still room for improvement in pre-training models on antibody sequence. On the one hand, existing protein pre-training models mainly utilize pre-training language models without fully considering the differences between protein sequences and language sequences; on the other hand, in comparison with other proteins, antibodies possess their uniqueness, which should be incorporated using specifically designed training methods. Here, we present a pre-trained model of antibody sequences, Pre-training with A Rational Approach for antibodies (PARA), that employs a training strategy conforming to antibody sequences patterns and an advanced NLP self-encoding model structure. We show PARA's performance on several tasks by comparing it to several published pre-trained models of antibodies. The results show that PARA significantly outperforms selected antibody pre-training models on these tasks, suggesting that PARA has an advantage in capturing antibody sequence information. To the best of our knowledge, PARA is the first antibody language model that takes into account the features of antibody sequences . We believe that the antibody latent representation provided by PARA can substantially facilitate the studies in relevant areas, such as antibody structure prediction, affinity prediction, and antibody de novo desig
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.