A computational framework for converting textual clinical diagnostic criteria into the quality data model

Hong, Na; Li, Dingcheng; Yu, Yue; Xiu, Qiongying; Liu, Hongfang; Jiang, Guoqian

doi:10.1016/j.jbi.2016.07.016

Cited by 5 publications

(3 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The database was searched with the keywords “endometriosis and genes” and “endometriosis and genetic”. The relevant publications were retrieved in extensible markup language (XML) format in order to make the information extraction more precise, with content enclosed within XML tag pairs [ 19 ]. The titles and abstracts of each article were converted into the PubTator format [ 20 ] through a custom Perl script.…”

Section: Methodsmentioning

confidence: 99%

How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

Bouaziz

Mashiach

Cohen

et al. 2018

BioMed Research International

View full text Add to dashboard Cite

Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

show abstract

Section: Methodsmentioning

confidence: 99%

How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

Bouaziz

Mashiach

Cohen

et al. 2018

BioMed Research International

View full text Add to dashboard Cite

show abstract

“…The overall performance of this system is measured in precision, recall, and F-Score. In addition, the same authors, Hong et al [59] proposed a model for the quality and performance-based data integration for information extraction, using NLP, ML, and Bag of Words (BoW). Moreover, Hong et al [60] used a Mayo Clinic dataset with the help of NLP toolkits for making a digital FHIR system.…”

Section: How Does the Data Harmonization Resolve The Issues Of Heterogeneity?mentioning

confidence: 99%

“…[57] Healthcare Helps in patient-lefted care decision-making among stakeholders [58] Healthcare Helps in finding the patient having obesity and comorbidities [59] Healthcare Helps in developing patient diagnostic criteria and representation [61] General-Purpose Support in integration, storage, computation, and visualization [62] Healthcare Open biomedical repositories can be developed in semantic web formats [60] Healthcare Normalizing and integration of structured and unstructured EHR data [63] Healthcare Helps health information system to keep a record of patients' data [64] Healthcare Helps in standardizing the clinical data normalization In previous studies, SSU heterogeneous data were used in the form of text, images, audio, video, and social media formats. The BD and BDA literature reviews proposed so many models and frameworks for data harmonization or integration.…”

Section: Study Reference Domain Contributionsmentioning

confidence: 99%

Data Harmonization for Heterogeneous Datasets: A Systematic Literature Review

et al. 2021

View full text Add to dashboard Cite

As data size increases drastically, its variety also increases. Investigating such heterogeneous data is one of the most challenging tasks in information management and data analytics. The heterogeneity and decentralization of data sources affect data visualization and prediction, thereby influencing analytical results accordingly. Data harmonization (DH) corresponds to a field that unifies the representation of such a disparate nature of data. Over the years, multiple solutions have been developed to minimize the heterogeneity aspects and disparity in formats of big-data types. In this study, a systematic review of the literature was conducted to assess the state-of-the-art DH techniques. This study aimed to understand the issues faced due to heterogeneity, the need for DH and the techniques that deal with substantial heterogeneous textual datasets. The process produced 1355 articles, but among them, only 70 articles were found to be relevant through inclusion and exclusion criteria methods. The result shows that the heterogeneity of structured, semi-structured, and unstructured (SSU) data can be managed by using DH and its core techniques, such as text preprocessing, Natural Language Preprocessing (NLP), machine learning (ML), and deep learning (DL). These techniques are applied to many real-world applications centered on the information-retrieval domain. Several assessment criteria were implemented to measure the efficiency of these techniques, such as precision, recall, F-1, accuracy, and time. A detailed explanation of each research question, common techniques, and performance measures is also discussed. Lastly, we present readers with a detailed discussion of the existing work, contributions, and managerial and academic implications, along with the conclusion, limitations, and future research directions.

show abstract