2020
DOI: 10.7717/peerj.8580
|View full text |Cite
|
Sign up to set email alerts
|

DISNET: a framework for extracting phenotypic disease information from public sources

Abstract: Background Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and rel… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 36 publications
(17 citation statements)
references
References 74 publications
0
17
0
Order By: Relevance
“…For the extraction of medical terms through the bio-NER tools, we used a dataset consisting of excerpts of 7500 Wikipedia articles and 620 Mayo Clinic articles, obtained between 2019 and 2020 as part of the DISNET project 35 . Each article is associated with a single disease, and there may be more than one article for the same disease.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the extraction of medical terms through the bio-NER tools, we used a dataset consisting of excerpts of 7500 Wikipedia articles and 620 Mayo Clinic articles, obtained between 2019 and 2020 as part of the DISNET project 35 . Each article is associated with a single disease, and there may be more than one article for the same disease.…”
Section: Methodsmentioning
confidence: 99%
“…The DISNET database integrates phenotypic and genetic-biological characteristics of diseases and information on drugs from several expert-curated sources and unstructured textual sources 35 . Phenotypic data is extracted from Wikipedia, PubMed, and Mayo Clinic texts, using MetaMap and a validation system called term validation process (TVP).…”
Section: Methodsmentioning
confidence: 99%
“…We queried the 2020-06-01 dump of Wikidata for items that contained a concept ID from UMLS, RxNorm, NDF-RT, ICD-9, ICD-10, or LOINC to search for Wikipedia medical articles. We also compared our result with the 2020-06-01 version of DISNET [25], which was based on DBpedia and focused on diseases.…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Semantic web projects and efforts associated with Wikipedia can be used to identify some of these categories [2,6,8,25]. For example, DBpedia [26] provides class labels that can help identify articles of certain categories, such as diseases and live beings, but it does not cover all target categories.…”
Section: Introductionmentioning
confidence: 99%
“…The current work has been developed in the context of DISNET system [23], a web service to extract disease knowledge structured in the basic concepts of the Human Disease Networks (HDN) [15], [16]. Although at this moment only its phenotypical data is publicly available, the system aims to form a complex multilayer graph.…”
Section: A Disnet Project and Data Acquisitionmentioning
confidence: 99%