With the advent of increased diversity and scale of molecular data, there has been a growing appreciation for the applications of machine learning and statistical methodologies to gain new biological insights. An important step in achieving this aim is the Relation Extraction process which specifies if an interaction exists between two or more biological entities in a published study. Here, we employed natural-language processing (CBOW) and deep Recurrent Neural Network (bi-directional LSTM) to predict relations between biological entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system was able to extract relevant text and the classifier predicted interactions between protein name, subcellular localisation and experimental methodology. It obtained a final precision, recall rate, accuracy and F1 scores of 0.951, 0.828, 0.893 and 0.884 respectively. The classifier was subsequently tested on a similar problem in crop species (CropPAL) and demonstrated a comparable accuracy measure (0.897). Consequently, our approach can be used to extract protein functional features from unstructured text in the literature with high accuracy. The developed system will improve dissemination or protein functional data to the scientific community and unlock the potential of big data text analytics for generating new hypotheses from diverse datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.