Background Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. Objective To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. Methods We used 7,000 treats and 2,918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80%–20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Results Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. Conclusions We employed semantic graph patterns connecting pairs of candidate entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction.
Identifying new potential treatment options (say, medications and procedures) for known medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Even before this step, due to recent advances, in silico or computational approaches are also being employed to identify viable treatment options. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. In this paper, we report preliminary results on predicting treatment relations between biomedical entities purely based on semantic patterns over biomedical knowledge graphs. As such, we refrain from explicitly using NLP, although the knowledge graphs themselves may be built from NLP extractions. Our intuition is fairly straightforward – entities that participate in a treatment relation may be connected using similar path patterns in biomedical knowledge graphs extracted from scientific literature. Using a dataset of treatment relation instances derived from the well known Unified Medical Language System (UMLS), we verify our intuition by employing graph path patterns from a well known knowledge graph as features in machine learned models. We achieve a high recall (92 %) but precision, however, decreases from 95% to an acceptable 71% as we go from uniform class distribution to a ten fold increase in negative instances. We also demonstrate models trained with patterns of length ≤ 3 result in statistically significant gains in F-score over those trained with patterns of length ≤ 2. Our results show the potential of exploiting knowledge graphs for relation extraction and we believe this is the first effort to employ graph patterns as features for identifying biomedical relations.
With the increasing use of digital technologies, online social networks are emerging as major means of communication. Recently, social networks such as Facebook and Twitter are also being used by consumers, care providers (physicians, hospitals), and government agencies to share health related information. The asymmetric user network and the short message size have made Twitter particularly popular for propagating health related content on the Web. Besides tweeting on their own, users can choose to retweet particular tweets from other users (even if they do not follow them on Twitter.) Thus, a tweet can diffuse through the Twitter network via the follower-friend connections. In this paper, we report results of a pilot study we conducted to quantitatively assess how health related tweets diffuse in the directed follower-friend Twitter graph through the retweeting activity. Our effort includes (1). development of a retweet collection and Twitter retweet graph formation framework and (2). a preliminary analysis of retweet graphs and associated diffusion metrics for health tweets. Given the ambiguous nature (due to polysemy and sarcasm) of health relatedness of tweets collected with keyword based matches, our initial study is limited to ≈ 200 health related tweets (which were manually verified to be on health topics) each with at least 25 retweets. To our knowledge, this is first attempt to study health information diffusion on Twitter through retweet graph analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.