2021
DOI: 10.2196/30824
|View full text |Cite
|
Sign up to set email alerts
|

Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study

Abstract: Background When using machine learning in the real world, the missing value problem is the first problem encountered. Methods to impute this missing value include statistical methods such as mean, expectation-maximization, and multiple imputations by chained equations (MICE) as well as machine learning methods such as multilayer perceptron, k-nearest neighbor, and decision tree. Objective The objective of this study was to impute numeric medical data su… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…model_selection.train_test_split (X, Y, test_size = 0.20, random_state = 1), X refers to ultrasonic, Y refers to MIR, and then sklearn.linear_model.LinearRegression (scikitlearn 0.24.2) was used for model training. We randomly selected 80% of the dataset to be used in generating the working model (called "training data") and the other 20% to test the model (called "test data"), as recommended in previous literature [41][42][43]. These steps were used in creating the correction models for protein, fat, lactose, and energy content.…”
Section: Adjustment Of the Ultrasonic Methods Results Using Machine L...mentioning
confidence: 99%
“…model_selection.train_test_split (X, Y, test_size = 0.20, random_state = 1), X refers to ultrasonic, Y refers to MIR, and then sklearn.linear_model.LinearRegression (scikitlearn 0.24.2) was used for model training. We randomly selected 80% of the dataset to be used in generating the working model (called "training data") and the other 20% to test the model (called "test data"), as recommended in previous literature [41][42][43]. These steps were used in creating the correction models for protein, fat, lactose, and energy content.…”
Section: Adjustment Of the Ultrasonic Methods Results Using Machine L...mentioning
confidence: 99%
“…Contrarily, in experimental approaches, expensive and time-consuming trials are conducted out before system characteristics and the considered results are correlated statistically. Unfortunately, both approaches' outcomes may be impacted by noisy circumstances [36]. Therefore, it would be ideal to create a reliable and accurate approach for simulating and forecasting the effects design and operational parameters on the HDH-SSF's performance.…”
mentioning
confidence: 99%