Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.TBoxes (for terminology), and assertions composed of them are ABoxes (for assertion) (12).
Knowledge-base vs. Knowledge GraphKnowledge-bases that can be represented as graphs are often called knowledge graphs.While not all knowledge-bases are implemented as graphs (e.g. some are databases where table structure makes implicit assertions), in recent years, it has become very common to represent knowledge-bases using the Semantic Web standard or, at least be able to produce and consume Semantic Web compatible versions. For that reason, the terms knowledge-base and knowledge graph are often used interchangeably. In 2012, Google announced its proprietary Knowledge Graph, which also popularized the use of the term (13). The literature sometimes contains terminological imprecision about what the differences are between knowledge-bases, knowledge graphs and ontologies; there is a review and analysis of various published definitions (14). In this review, we use the term knowledge graph (or KG) and say a KG is grounded in the set of primitives from which it is constructed. Some KGs also include a set of logical rules that relate assertions to each other (e.g. Human TP53 is the subclass of TP53 proteins that is found in the organism human) called axioms.
Biomedical ApplicationsKBDS does computation over KGs (and perhaps other inputs) to make inferences about biomedicine. While each of the publications surveyed below addresses different problems using different techniques, there are some common themes in the computational approaches to using KGs.