Supply and demand increase in response to healthcare trends. Moreover, personal health records (PHRs) are being managed by individuals. Such records are collected using different avenues and vary considerably in terms of their type and scope depending on the particular circumstances. As a result, some data may be missing, which has a negative effect on the data analysis, and such data should, therefore, be replaced with appropriate values. In this study, a method for estimating missing data using a multi-modal autoencoder applied to the field of healthcare big data is proposed. The proposed method uses a stacked denoising autoencoder to estimate the missing data that occur during the data collection and processing stages. Autoencoders are neural networks that output value of x^similar to an input value of x. In the present study, data from the Korean National Health Nutrition Examination Survey (KNHNES), conducted by the Korea Centers for Disease Control and Prevention (KCDC), are used. As representative healthcare data from South Korea, they contain a large number of parameters identical to those used in the PHRs. Based on this, models can be generated to estimate missing data occurring in PHRs. Furthermore, PHRs involve a multimodality that allows the data to be collected from multiple sources for a single object. Therefore, the stacked denoising autoencoder applied is configured under a multi-modal setting. Through pre-processing, a set of data without missing value in KNHNES is designed. In the data set based learning, a label is set as original data, and an autoencoder input is set as noised input that additionally has as many random zero numbers as noise factor. In this way, the autoencoder learns in the way of making the zero-based noise value similar to the original label value. When the amount of missing data in a dataset reaches approximately 25%, the accuracy of the proposed method using a multi-modal stacked denoising autoencoder is 0.9217, which is higher than that achieved by other ordinary methods. For a single-modal denoising autoencoder, the accuracy is 0.932, with a slight difference of approximately 0.01, which falls within the allowable limits in data analysis. In terms of computational performance, a single-modal autoencoder has 10,384 parameters, which is 5,594 more than those used in a multi-modal stacked autoencoder. These parameters affect the speed of the model. Both models exhibit a significant difference in the number of parameters but demonstrate a relatively small difference in accuracy, suggesting that the proposed multi-modal stacked denoising autoencoder is advantageous over a single-modal model when used on a personal device. Moreover, a multi-modal model can save additional time when processing large amounts of data in locations such as hospitals and institutions. INDEX TERMS Autoencoder, data pre-processing, data estimation, data imputation, health big data, multimodal, missing data, machine learning.
BACKGROUND: Humans supply a variety of nutrients to their body in dietary life, which are directly related to health. Chronic diseases are long accumulated in the body on account of heredity or living habits, and draw attention as a main issue in the era of disease-controlled longevity. Therefore, it is essential to make health care continuously through the improvement in dietary habits. OBJECTIVE: By recommending alternative food products whose diet and nutrition structure is similar to that of the food products positively influencing users' health conditions, it is possible to satisfy user's health and preference. METHOD: We used the hybrid clustering based food recommendation method that uses chronic disease based clustering, diet and nutrition ontology, diet and nutrition knowledge base. Active users are classified into the chronic disease based cluster that has the nearest euclidean distance. According to the classified clusters, food products are recommended to users, and similar food products are also recommended with the use of food clustering and knowledge base. Food products are clustered with the uses of k-means algorithm and food and nutrient data system. Based on the created food clusters and food preference data, diet and nutrition knowledge base is generated. It is composed of food cluster filter, food similarity filter, universal preference filter, and user feedback filter. The universal preference filter represents the similarity weight between diet and nutrition, and user preference. The user feedback filter has the similarity weight between active user preference and diet and nutrition. They continue to be updated through associated feedback. RESULT: The proposed health decision-making method takes into account each user's health condition so that the method has more precision than an existing recommendation method. In addition, the proposed method brings about better evaluation results than a general user-by-user health context information based recommendation method. CONCLUSION: By recommending the food products related to users' chronic diseases through the proposed hybrid clustering, it is possible to help out their healthcare. In addition, by letting users receive satisfying feedback flexibly, it is possible to improve their dietary habits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.