2019
DOI: 10.1200/cci.18.00132
|View full text |Cite
|
Sign up to set email alerts
|

Open Source Infrastructure for Health Care Data Integration and Machine Learning Analyses

Abstract: PURPOSE We have created a cloud-based machine learning system (CLOBNET) that is an open-source, lean infrastructure for electronic health record (EHR) data integration and is capable of extract, transform, and load (ETL) processing. CLOBNET enables comprehensive analysis and visualization of structured EHR data. We demonstrate the utility of CLOBNET by predicting primary therapy outcomes of patients with high-grade serous ovarian cancer (HGSOC) on the basis of EHR data. MATERIALS AND METHODS CLOBNET is built u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 15 publications
0
10
0
Order By: Relevance
“…The operating team systematically assessed and documented the disease spread and tumor volume in the peritoneal cavity and retroperitoneum with a standardized 16-part questionnaire during surgery. Each abdominal site and possible metastasis was included, and a recently validated disease dissemination score ranging from 0 to 21 was calculated ( Table 2) [28]. We divided patients to a low tumor load group (score 0-11) and a high tumor load group (score 12-21) based on the disease dissemination score.…”
Section: Study Populationmentioning
confidence: 99%
“…The operating team systematically assessed and documented the disease spread and tumor volume in the peritoneal cavity and retroperitoneum with a standardized 16-part questionnaire during surgery. Each abdominal site and possible metastasis was included, and a recently validated disease dissemination score ranging from 0 to 21 was calculated ( Table 2) [28]. We divided patients to a low tumor load group (score 0-11) and a high tumor load group (score 12-21) based on the disease dissemination score.…”
Section: Study Populationmentioning
confidence: 99%
“…The operating gynecologic oncologist assessed the disease stage as stated by the International Federation of Gynecology and Obstetrics (FIGO) 2014 classification and a pathologist confirmed it from tissue biopsies. The operating team recorded the disease spread in the abdominal cavity and retroperitoneum to calculate a previously described disease dissemination score (range 0-21) (Table S1) [24]. The disease dissemination score showed prognostic value in a previous study [24].…”
Section: Study Populationmentioning
confidence: 99%
“…The operating team recorded the disease spread in the abdominal cavity and retroperitoneum to calculate a previously described disease dissemination score (range 0-21) (Table S1) [24]. The disease dissemination score showed prognostic value in a previous study [24]. Patients were divided into a low tumor load group (dissemination score 0-12) and a high tumor load group (dissemination score [13][14][15][16][17][18][19][20][21].…”
Section: Study Populationmentioning
confidence: 99%
“…The data integration approaches from recent years use a combination of both approaches. Several papers [46][47][48][58][59][60] present ETL database functions that allow pulling data from a source database and placing it into a target database. The first three of these papers define manual processes by which local sites define mappings between source data elements (SDE) to a set of needed common data elements (CDE).…”
Section: Data Integration Via Ontologies and Vocabulariesmentioning
confidence: 99%
“…This clustering is relevant to discriminate between different breast cancer subtypes and to identify their relations and has the strength that it can be performed in an unsupervised manner, removing much of the burden of integration based on information models and ontologies. Isoviita et al [48] present an open-source, cloud-based machine learning system where datasets from multiple (live) sources (EHR databases and research databases) are integrated using extract, transform, load (ETL) processes and melded into a single database, but with minimal transformations. These merged but heterogeneous data are used for the training of ML predictive models.…”
Section: Data Integration Supported By Machine Learningmentioning
confidence: 99%