CardioNet: a manually curated database for artificial intelligence-based research on cardiovascular diseases

Ahn, Imjin; Na, Wonjun; Kwon, Osung; Yang, Dong Hyun; Park, Gyung‐Min; Gwon, Hansle; Kang, Hee-Jun; Jeong, Yeon Uk; Yoo, Jungsun; Kim, Yunha; Jun, Tae Joon; Kim, Young‐Hak

doi:10.1186/s12911-021-01392-2

Cited by 22 publications

(21 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We validated our method with data from CardioNet [ 25 ], a real-world EMR. The demographic information from CardioNet appears in Table 1 , and we selected 10,000 of the data points as the teacher data and 50,000 of the data points as student data.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study

Gwon¹,

Ahn²,

Kim³

et al. 2021

JMIR Public Health Surveill

Self Cite

View full text Add to dashboard Cite

Background When using machine learning in the real world, the missing value problem is the first problem encountered. Methods to impute this missing value include statistical methods such as mean, expectation-maximization, and multiple imputations by chained equations (MICE) as well as machine learning methods such as multilayer perceptron, k-nearest neighbor, and decision tree. Objective The objective of this study was to impute numeric medical data such as physical data and laboratory data. We aimed to effectively impute data using a progressive method called self-training in the medical field where training data are scarce. Methods In this paper, we propose a self-training method that gradually increases the available data. Models trained with complete data predict the missing values in incomplete data. Among the incomplete data, the data in which the missing value is validly predicted are incorporated into the complete data. Using the predicted value as the actual value is called pseudolabeling. This process is repeated until the condition is satisfied. The most important part of this process is how to evaluate the accuracy of pseudolabels. They can be evaluated by observing the effect of the pseudolabeled data on the performance of the model. Results In self-training using random forest (RF), mean squared error was up to 12% lower than pure RF, and the Pearson correlation coefficient was 0.1% higher. This difference was confirmed statistically. In the Friedman test performed on MICE and RF, self-training showed a P value between .003 and .02. A Wilcoxon signed-rank test performed on the mean imputation showed the lowest possible P value, 3.05e-5, in all situations. Conclusions Self-training showed significant results in comparing the predicted values and actual values, but it needs to be verified in an actual machine learning system. And self-training has the potential to improve performance according to the pseudolabel evaluation method, which will be the main subject of our future research.

show abstract

Section: Resultsmentioning

confidence: 99%

“…We can confirm this with practical medical records. The collection of data and data preparation received Asan Medical Center and Ulsan University Hospital institutional review board approval with waived informed consent (AMCCV 2016-26 ver2.1) [ 25 ]. Figure 3 shows a boxplot of 2 features — chloride and PT(INR).…”

Section: Methodsmentioning

confidence: 99%

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study

Gwon¹,

Ahn²,

Kim³

et al. 2021

JMIR Public Health Surveill

Self Cite

View full text Add to dashboard Cite

show abstract

“…Data were extracted from CardioNet [ 18 ] ( Textbox 1 ), a manually curated EHR database specialized in CVDs. CardioNet consists of data from 572,811 patients who had visited Asan Medical Center (AMC) with CVDs between January 1, 2000, and December 31, 2016.…”

Section: Methodsmentioning

confidence: 99%

“…Data were extracted from CardioNet [18] From the 572,811 patients in CardioNet, we obtained 84,251 records of 63,261 anonymous patients hospitalized in the departments of cardiology or thoracic surgery. Furthermore, to develop a practical and usable model, we focused on predicting discharge within 3 days and detecting long-term patients.…”

Section: Data Acquisitionmentioning

confidence: 99%

Machine Learning–Based Hospital Discharge Prediction for Patients With Cardiovascular Diseases: Development and Usability Study

Ahn¹,

Gwon²,

Kang³

et al. 2021

JMIR Med Inform

Self Cite

View full text Add to dashboard Cite

Background Effective resource management in hospitals can improve the quality of medical services by reducing labor-intensive burdens on staff, decreasing inpatient waiting time, and securing the optimal treatment time. The use of hospital processes requires effective bed management; a stay in the hospital that is longer than the optimal treatment time hinders bed management. Therefore, predicting a patient’s hospitalization period may support the making of judicious decisions regarding bed management. Objective First, this study aims to develop a machine learning (ML)–based predictive model for predicting the discharge probability of inpatients with cardiovascular diseases (CVDs). Second, we aim to assess the outcome of the predictive model and explain the primary risk factors of inpatients for patient-specific care. Finally, we aim to evaluate whether our ML-based predictive model helps manage bed scheduling efficiently and detects long-term inpatients in advance to improve the use of hospital processes and enhance the quality of medical services. Methods We set up the cohort criteria and extracted the data from CardioNet, a manually curated database that specializes in CVDs. We processed the data to create a suitable data set by reindexing the date-index, integrating the present features with past features from the previous 3 years, and imputing missing values. Subsequently, we trained the ML-based predictive models and evaluated them to find an elaborate model. Finally, we predicted the discharge probability within 3 days and explained the outcomes of the model by identifying, quantifying, and visualizing its features. Results We experimented with 5 ML-based models using 5 cross-validations. Extreme gradient boosting, which was selected as the final model, accomplished an average area under the receiver operating characteristic curve score that was 0.865 higher than that of the other models (ie, logistic regression, random forest, support vector machine, and multilayer perceptron). Furthermore, we performed feature reduction, represented the feature importance, and assessed prediction outcomes. One of the outcomes, the individual explainer, provides a discharge score during hospitalization and a daily feature influence score to the medical team and patients. Finally, we visualized simulated bed management to use the outcomes. Conclusions In this study, we propose an individual explainer based on an ML-based predictive model, which provides the discharge probability and relative contributions of individual features. Our model can assist medical teams and patients in identifying individual and common risk factors in CVDs and can support hospital administrators in improving the management of hospital beds and other resources.

show abstract

“…Typical examples include the improvement of ultra-low-dose CT and the segmentation of very small or delicate structures (e.g., coronary plaques and valves) [21,27]. Recently, electronic medical records with large data have been prepared for various AI research [50]. Cardiovascular CT powered by AI or radiomic analysis [51] can be combined with other imaging modalities or clinical information (e.g., ECG and blood laboratory tests) to guide decision-making or prognostication.…”

Section: Future Perspectivesmentioning

confidence: 99%

Application of Artificial Intelligence to Cardiovascular Computed Tomography

Yang

2021

Korean J Radiol

Self Cite

View full text Add to dashboard Cite

Cardiovascular computed tomography (CT) is among the most active fields with ongoing technical innovation related to image acquisition and analysis. Artificial intelligence can be incorporated into various clinical applications of cardiovascular CT, including imaging of the heart valves and coronary arteries, as well as imaging to evaluate myocardial function and congenital heart disease. This review summarizes the latest research on the application of deep learning to cardiovascular CT. The areas covered range from image quality improvement to automatic analysis of CT images, including methods such as calcium scoring, image segmentation, and coronary artery evaluation.

show abstract

CardioNet: a manually curated database for artificial intelligence-based research on cardiovascular diseases

Cited by 22 publications

References 17 publications

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study

Machine Learning–Based Hospital Discharge Prediction for Patients With Cardiovascular Diseases: Development and Usability Study

Application of Artificial Intelligence to Cardiovascular Computed Tomography

Contact Info

Product

Resources

About