Ontologies have become a key element since many decades in information systems such as in epidemiological surveillance domain. Building domain ontologies requires the access to domain knowledge owned by domain experts or contained in knowledge sources. However, domain experts are not always available for interviews. Therefore, there is a lot of value in using ontology learning which consists in automatic or semi-automatic extraction of ontological knowledge from structured or unstructured knowledge sources such as texts, databases, etc. Many techniques have been used but they all are limited in concepts, properties and terminology extraction leaving behind axioms and rules. Source code which naturally embed domain knowledge is rarely used. In this paper, we propose an approach based on Hidden Markov Models (HMMs) for concepts, properties, axioms and rules learning from Java source code. This approach is experimented with the source code of EPICAM, an epidemiological platform developed in Java and used in Cameroon for tuberculosis surveillance. Domain experts involved in the evaluation estimated that knowledge extracted was relevant to the domain. In addition, we performed an automatic evaluation of the relevance of the terms extracted to the medical domain by aligning them with ontologies hosted on Bioportal platform through the Ontology Recommender tool. The results were interesting since the terms extracted were covered at 82.9% by many biomedical ontologies such as NCIT, SNOWMEDCT and ONTOPARON.
Food Composition Tables (FCT) or Food Composition Databases (FCD) contains the food we eat and what it contains. It is built by using chemical analysis to determine the different composition and structure of foods. However, the chemical analysis of food requires significant financial resources and skilled laboratory investigators. These resources are not always available. Thus, in many cases, to build Food Composition Tables, many people rely on existing resources such as scientific papers. Scientific papers contain key-insights organized in text, tables figures, etc. that are used to understand the scientific contribution of its author. Many FCT are stored in scientific papers related to food, nutrition, food chemistry, etc. in the form of tables. Acquiring these tables manually as it is currently done by domain experts is costly, not scalable and cumbersome work because one has to open the paper, copy the elements one by one and save in a file such as CSV files. This paper proposes to learn Food Composition Knowledge (FCK) stores in tables of scientific papers. It consists of using Deep Learning techniques for the automatic detection of tables, text recognition from these tables, text extraction and table reconstruction. This approach was used to extract over 10,000 tables from around 5000 scientific papers. To validate the knowledge extracted, we presented 100 tables selected manually to a Professor in Food Science and Nutrition. On the other hand, the validation by Ontology Recommender of Bioportal showed that the knowledge extracted are relevant to the biomedical domain in general and can be used to enrich food ontologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.