We present a machine learning framework to automate knowledge discovery through knowledge graph construction, inconsistency resolution, and iterative link prediction. By incorporating knowledge from 10 publicly available sources, we construct an Escherichia coli antibiotic resistance knowledge graph with 651,758 triples from 23 triple types after resolving 236 sets of inconsistencies. Iteratively applying link prediction to this graph and wet-lab validation of the generated hypotheses reveal 15 antibiotic resistant E. coli genes, with 6 of them never associated with antibiotic resistance for any microbe. Iterative link prediction leads to a performance improvement and more findings. The probability of positive findings highly correlates with experimentally validated findings (R2 = 0.94). We also identify 5 homologs in Salmonella enterica that are all validated to confer resistance to antibiotics. This work demonstrates how evidence-driven decisions are a step toward automating knowledge discovery with high confidence and accelerated pace, thereby substituting traditional time-consuming and expensive methods.
Food ontologies require significant effort to create and maintain as they involve manual and time-consuming tasks, often with limited alignment to the underlying food science knowledge. We propose a semi-supervised framework for the automated ontology population from an existing ontology scaffold by using word embeddings. Having applied this on the domain of food and subsequent evaluation against an expert-curated ontology, FoodOn, we observe that the food word embeddings capture the latent relationships and characteristics of foods. The resulting ontology, which utilizes word embeddings trained from the Wikipedia corpus, has an improvement of 89.7% in precision when compared to the expert-curated ontology FoodOn (0.34 vs. 0.18, respectively, p value = 2.6 × 10–138), and it has a 43.6% shorter path distance (hops) between predicted and actual food instances (2.91 vs. 5.16, respectively, p value = 4.7 × 10–84) when compared to other methods. This work demonstrates how high-dimensional representations of food can be used to populate ontologies and paves the way for learning ontologies that integrate contextual information from a variety of sources and types.
The study's objectives were to identify cow-level and environmental factors associated with metritis cure to predict metritis cure using traditional statistics and machine learning algorithms. The data set used was from a previous study comparing the efficacy of different therapies and self-cure for metritis. Metritis was defined as fetid, watery, reddish-brownish discharge, with or without fever. Cure was defined as an absence of metritis signs 12 d after diagnosis. Cows were randomly allocated to receive a subcutaneous injection of 6.6 mg/kg of ceftiofur crystalline-free acid (Excede, Zoetis) at the day of diagnosis and 3 d later (n = 275); and no treatment at the time of metritis diagnosis (n = 275). The variables days in milk (DIM) at metritis diagnosis, treatment, season of the metritis diagnosis, month of metritis diagnostic, number of lactation, parity, calving score, dystocia, retained fetal membranes, body condition score at d 5 postpartum, vulvovaginal laceration score, the rectal temperature at the metritis diagnosis, fever at diagnosis, milk production from the day before to metritis diagnosis, and milk production slope up to 5, 7, and 9 DIM were offered to univariate logistic regression. Variables included in the multivariable logistic regression model were selected from the univariate analysis according to P-value. Variables were offered to the model to assess the association between these factors and metritis cure. Additionally, the univariate logistic regression variables were offered to a recursive feature elimination to find the optimal subset of features for a machine learning algorithms analysis. Cows without vulvovaginal laceration had 1.91 higher odds of curing of metritis than cows with vulvovaginal laceration. Cows that developed metritis at >7 DIM had 2.09 higher odds of being cured than cows that developed metritis at ≤7 DIM. For rectal temperature, each degree Celsius above 39.4°C led to lower odds to be cured than cows with rectal temperature ≤39.4°C. Furthermore, milk production slope and milk production difference from the day before to the metritis diagnosis were essential variables to predict metritis cure. Cows that had reduced milk production from the day before to the metritis diagnosis had lower odds to be cured than cows with moderate milk production increase. The results from the multivariable logistic regression and receiver operating characteristic analysis indicated that cows developing metritis at >7 DIM, with increase in milk production, and with a rectal temperature ≤39.40°C had increased likelihood of cure of metritis with an accuracy of 75%. The machine learning analysis showed that in addition to these variables, calving-related disorders, season, and month of metritis event were needed to predict whether the cow will cure or not from metritis with an accuracy ≥70% and F1 score (harmonic mean between precision and recall) ≥0.78. Although machine learning algorithms are acknowledged as powerful tools for predictive classification, the current study was unable to repli...
The ability of knowledge graphs to represent complex relationships at scale has led to their adoption for various needs including knowledge representation, question-answering, fraud detection, and recommendation systems. Knowledge graphs are often incomplete in the information they represent, necessitating the need for knowledge graph completion tasks, such as link and relation prediction. Pre-trained and fine-tuned language models have shown promise in these tasks although these models ignore the intrinsic information encoded in the knowledge graph, namely the entity and relation types. In this work, we propose the Knowledge Graph Language Model (KGLM) architecture, where we introduce a new entity/relation embedding layer that learns to differentiate distinctive entity and relation types, therefore allowing the model to learn the structure of the knowledge graph. In this work, we show that further pre-training the language models with this additional embedding layer using the triples extracted from the knowledge graph, followed by the standard fine-tuning phase sets a new state-of-the-art performance for the link prediction task on the benchmark datasets.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.