2017
DOI: 10.1155/2017/8565739
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules

Abstract: Named-Entity Recognition is commonly used to identify biological entities such as proteins, genes, and chemical compounds found in scientific articles. The Human Phenotype Ontology (HPO) is an ontology that provides a standardized vocabulary for phenotypic abnormalities found in human diseases. This article presents the Identifying Human Phenotypes (IHP) system, tuned to recognize HPO entities in unstructured text. IHP uses Stanford CoreNLP for text processing and applies Conditional Random Fields trained with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(19 citation statements)
references
References 15 publications
(18 reference statements)
0
19
0
Order By: Relevance
“…However, the performance of the classifier can be further improved by including more refined semantic similarity methods. Additionally, text mining techniques may also be employed to find the functionality of genes in literature that are lacking semantic scores [32]. Moreover, to attain reliable ASD genes predictions, future studies should be focused on combining protein-protein interactions with semantic similarity scores.…”
Section: Discussionmentioning
confidence: 99%
“…However, the performance of the classifier can be further improved by including more refined semantic similarity methods. Additionally, text mining techniques may also be employed to find the functionality of genes in literature that are lacking semantic scores [32]. Moreover, to attain reliable ASD genes predictions, future studies should be focused on combining protein-protein interactions with semantic similarity scores.…”
Section: Discussionmentioning
confidence: 99%
“…cTAKES is a more general medical knowledge extraction system primarily designed for SNOMED-CT, while BioLarK and the OBO annotator are concept recognizers primarily tailored for the HPO. Another method, called IHP (Identifying Human Phenotypes) [15], was recently introduced for identifying HPO terms in unstructured text using machine learning for named entity recognition and a rule-based approach for further extending them. However, this method is not directly comparable, as it only reports the text spans that are a phenotype and does not classify or rank matching HPO terms.…”
Section: Resultsmentioning
confidence: 99%
“…Examples of popular tools for general purpose are the NCBO (National Center for Biomedical Ontology) annotator [10], OBO (Open Biological and Biomedical Ontologies) annotator [11], MetaMap [12], and Apache cTAKES (Clinical Text Analysis and Knowledge Extraction System) [13]. Other tools focusing on more specific domains have also been developed, such as BioLark [14] for automatic recognition of terms from the HPO and a tool by Lobo et al [15], which combines a machine learning approach with manual validation rules to detect HPO terms. Another example is the phenotype search tool provided by PhenoTips [16], which uses Apache Solr indexed on the HPO and has an extensive set of rule-based techniques to rank matching phenotypes for a query.…”
Section: Introductionmentioning
confidence: 99%
“…Another rule-based synonym expansion approach to extending the Gene Ontology showed improved performance in named entity recognition (NER) tasks [9]. A combined machine-learning and rule-based approach to learning new HP synonyms from manually annotated PubMed abstracts improved performance of an annotation task over a gold standard text corpus [10].…”
Section: Introductionmentioning
confidence: 99%