2021
DOI: 10.21203/rs.3.rs-535520/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Survey On Missing Data in Machine Learning

Abstract: Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing value… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(14 citation statements)
references
References 106 publications
0
14
0
Order By: Relevance
“…The common reasons for missing values (MVs) are diverse, including respondents in the household survey may refuse to report income; in industry experiments, some results are missing because of mechanical failures unrelated to the experimental process; in medical experiments, some participants drop out because of drug allergies, deaths or other reasons [1]. To sum up, these reasons can be roughly divided into four types, including (1) human mistakes when processing data, (2) machine error caused by equipment malfunction, (3) respondents' refusal to answer specific questions, (4) drop-out from studies and merging unrelated data [2][3][4]. Missing data is unavoidable, despite the fact that we are all aware that gathering as much data as possible is the ideal strategy for data analysis.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The common reasons for missing values (MVs) are diverse, including respondents in the household survey may refuse to report income; in industry experiments, some results are missing because of mechanical failures unrelated to the experimental process; in medical experiments, some participants drop out because of drug allergies, deaths or other reasons [1]. To sum up, these reasons can be roughly divided into four types, including (1) human mistakes when processing data, (2) machine error caused by equipment malfunction, (3) respondents' refusal to answer specific questions, (4) drop-out from studies and merging unrelated data [2][3][4]. Missing data is unavoidable, despite the fact that we are all aware that gathering as much data as possible is the ideal strategy for data analysis.…”
Section: Introductionmentioning
confidence: 99%
“…They outlined a few issues with these studies, including the small size of experimental datasets, and the lack of attention to missing mechanisms. Recently, Emmanuel et al [2] compiled some literature with a focus on machine learning methods. They tested with the KNN and random forest (RF) imputation techniques at the same time, however, they only employed two tiny datasets, the Iris and ID fan datasets [16].…”
Section: Introductionmentioning
confidence: 99%
“…This study aims to avoid this issue by using four imputation techniques based on kNN, sliding-windows (SW), regression (RI) and support vector machine-basis (SVMI) algorithms. Note that the mentioned imputation techniques are recently used in the literature (see (Malarvizhi & Thanamani, 2012) for kNN imputation, (Emmanuel et al, 2021) for SVMI, (Doreswamy & Manjunatha, 2017) for RI) and developed by the compilation of the missing data in general. In this paper, those methods are adapted to the right-censored data and the modelling procedure.…”
Section: Introductionmentioning
confidence: 99%
“…For example, the training of a feedforward neural network requires complete inputs in order for the hidden layers to feed forward valid inputs during the forward pass and then update the weights appropriately during the backpropagation step (M. L. Brown & Kros, 2003). Therefore, it is not immediately obvious how one would use such a model in the second category of techniques (i.e., without interpolation of the gaps) and this remains an open problem in the machine learning community (Caiafa et al., 2021; Emmanuel et al., 2021; Sharpe & Solly, 1995). One solution is to use a Cosine Neural Network (Randolph‐Gips, 2008).…”
Section: Introductionmentioning
confidence: 99%