Recently, data-driven decision-making has attracted great interest; this requires high-quality datasets. However, real-world datasets often feature missing values for unknown or intentional reasons, rendering data-driven decision-making inaccurate. If a machine learning model is trained using incomplete datasets with missing values, the inferred results may be biased. In this case, a commonly used technique is the missing value imputation (MVI), which fills missing data with possible values estimated based on observed values. Various data imputation methods using machine learning, statistical inference, and relational database theories have been developed. Among them, conventional machine learning based imputation methods that handle tabular data can deal with only numerical columns or are time-consuming and cumbersome because they create an individualized predictive model for each column. Therefore, we have developed a novel imputational neural network that we term the Denoising Self-Attention Network (DSAN). Our proposed DSAN can deal with tabular datasets containing both numerical and categorical columns; it considers discretized numerical values as categorical values for embedding and self-attention layers. Furthermore, the DSAN learns robust feature expression vectors by combining self-attention and denoising techniques, and can predict multiple, appropriate substituted values simultaneously (via multi-task learning). To verify the validity of the method, we performed data imputation experiments after arbitrarily generating missing values for several real-world tabular datasets. We evaluated both imputational and downstream task performances, and we have seen that the DSAN outperformed the other models, especially in terms of category variable imputation.
BACKGROUND Intraoperative hypotension (IOH) is associated with an increased risk of postoperative complications. Therefore, in recent years, various models for IOH prediction based on high-dimensional signal data have been developed. Given that the association between the high-dimensionality of data and the overfitting problem, it is very important to establish a strategy to prevent the overfitting problem. However, there has been little discussion of the strategy. OBJECTIVE This work aimed to develop an overfitting-resistant deep learning model that uses preoperative patient data along with intraoperative bio-signal information to predict the IOH about 5 minutes prior to its occurrence. METHODS Mean arterial blood pressure (2 sec interval) and electronic medical records of 990 patients from open-source database, VitalDB were integrated for this study. The IOH was defined as an MBP < 65 mmHg for >1 min. Our proposed deep learning model accommodates the dropout method for preventing overfitting and the permutation method for reducing the dependence of the American Society of Anesthesiologists (ASA) status on IOH; we permuted the ASA status in the process of model training. The primary outcome was evaluated in terms of the area under the receiver operating characteristic curve (AUROC). RESULTS The model with the permutation method showed better performance (AUROC, 95% confidence interval [CI]: 0.842, 0.838-0.845) than that of model without the permutation method (AUROC, 95% CI: 0.830, 0.825-0.835). Furthermore, the model with both the permutation and dropout methods exhibited the best performance (AUROC, 95% CI: 0.862, 0.859-0.861). CONCLUSIONS Our work demonstrated the effectiveness of the permutation method in preventing the overfitting problem. Ultimately, the introduction of the permutation of the ASA status and dropout methods into a deep learning model can prevent the overfitting problem and improve the accuracy of IOH prediction.
Most text classification systems use machine learning algorithms; among these, naïve Bayes and support vector machine algorithms adapted to handle text data afford reasonable performance. Recently, given developments in deep learning technology, several scholars have used deep neural networks (recurrent and convolutional neural networks) to improve text classification. However, deep learning-based text classification has not greatly improved performance compared to that of conventional algorithms. This is because a textual document is essentially expressed as a vector (only), albeit with word dimensions, which compromises the inherent semantic information, even if the vector is (appropriately) transformed to add conceptual information. To solve this `loss of term senses’ problem, we develop a concept-driven deep neural network based upon our semantic tensor space model. The semantic tensor used for text representation features a dependency between the term and the concept; we use this to develop three deep neural networks for text classification. We perform experiments using three standard document corpora, and we show that our proposed methods are superior to both traditional and more recent learning methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.