Pre-trained natural language processing models on a large natural language corpus can naturally transfer learned knowledge to protein domains by fine-tuning specific in-domain tasks. However, few studies focused on enriching such protein language models by jointly learning protein properties from strongly-correlated protein tasks. Here we elaborately designed a multi-task learning (MTL) architecture, aiming to decipher implicit structural and evolutionary information from three sequence-level classification tasks for protein family, superfamily and fold. Considering the co-existing contextual relevance between human words and protein language, we employed BERT, pre-trained on a large natural language corpus, as our backbone to handle protein sequences. More importantly, the encoded knowledge obtained in the MTL stage can be well transferred to more fine-grained downstream tasks of TAPE. Experiments on structure- or evolution-related applications demonstrate that our approach outperforms many state-of-the-art Transformer-based protein models, especially in remote homology detection.
Insect pest recognition has always been a significant branch of agriculture and ecology. The slight variance among different kinds of insects in appearance makes it hard for human experts to recognize. It is increasingly imperative to finely recognize specific insects by employing machine learning methods. In this study, we proposed a feature fusion network to synthesize feature presentations in different backbone models. Firstly, we employed one CNN-based backbone ResNet, and two attention-based backbones Vision Transformer and Swin Transformer to localize the important regions of insect images with Grad-CAM. During this process, we designed new architectures for these two Transformers to enable Grad-CAM to be applicable in such attention-based models. Then we further proposed an attention-selection mechanism to reconstruct the attention area by delicately integrating the important regions, enabling these partial but key expressions to complement each other. We only need part of the image scope that represents the most crucial decision-making information for insect recognition. We randomly selected 20 species of insects from the IP102 dataset and then adopted all 102 kinds of insects to test the classification performance. Experimental results show that the proposed approach outperforms other advanced CNN-based models. More importantly, our attention-selection mechanism demonstrates good robustness to augmented images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.