Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Aleryani, Aliya; Wang, Wenjia; Iglesia, Beatriz de la

doi:10.1007/s42979-020-00131-0

Cited by 30 publications

(17 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, in another study [123] the authors proposed a Multiple Imputation Ensembles approach for handling with missing data in classification problems. They combined multiple imputation and ensemble techniques and compared two types of ensembles namely, bagging and stacking.…”

Section: Ensemble Methodsmentioning

confidence: 99%

A Survey On Missing Data in Machine Learning

Emmanuel

Maupong

Mpoeleng

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

show abstract

Section: Ensemble Methodsmentioning

confidence: 99%

A Survey On Missing Data in Machine Learning

Emmanuel

Maupong

Mpoeleng

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Also, in another study Aleryani et al [ 120 ] the authors proposed a Multiple Imputation Ensembles approach for handling with missing data in classification problems. They combined multiple imputation and ensemble techniques and compared two types of ensembles namely, bagging and stacking.…”

Section: Missing Values Approachesmentioning

confidence: 99%

A survey on missing data in machine learning

et al. 2021

View full text Add to dashboard Cite

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

show abstract

“…Most real-world datasets contain missing values. This can cause issues for a number of ML methods [109]. The percentage of missing values differed between studies.…”

Section: Handling Of Missing Datamentioning

confidence: 99%

What can machines learn about heart failure? A systematic literature review

Jasinska-Piadlo

Bond

Biglarbeigi

et al. 2021

Int J Data Sci Anal

View full text Add to dashboard Cite

This paper presents a systematic literature review with respect to application of data science and machine learning (ML) to heart failure (HF) datasets with the intention of generating both a synthesis of relevant findings and a critical evaluation of approaches, applicability and accuracy in order to inform future work within this field. This paper has a particular intention to consider ways in which the low uptake of ML techniques within clinical practice could be resolved. Literature searches were performed on Scopus (2014-2021), ProQuest and Ovid MEDLINE databases (2014-2021). Search terms included ‘heart failure’ or ‘cardiomyopathy’ and ‘machine learning’, ‘data analytics’, ‘data mining’ or ‘data science’. 81 out of 1688 articles were included in the review. The majority of studies were retrospective cohort studies. The median size of the patient cohort across all studies was 1944 (min 46, max 93260). The largest patient samples were used in readmission prediction models with the median sample size of 5676 (min. 380, max. 93260). Machine learning methods focused on common HF problems: detection of HF from available dataset, prediction of hospital readmission following index hospitalization, mortality prediction, classification and clustering of HF cohorts into subgroups with distinctive features and response to HF treatment. The most common ML methods used were logistic regression, decision trees, random forest and support vector machines. Information on validation of models was scarce. Based on the authors’ affiliations, there was a median 3:1 ratio between IT specialists and clinicians. Over half of studies were co-authored by a collaboration of medical and IT specialists. Approximately 25% of papers were authored solely by IT specialists who did not seek clinical input in data interpretation. The application of ML to datasets, in particular clustering methods, enabled the development of classification models assisting in testing the outcomes of patients with HF. There is, however, a tendency to over-claim the potential usefulness of ML models for clinical practice. The next body of work that is required for this research discipline is the design of randomised controlled trials (RCTs) with the use of ML in an intervention arm in order to prospectively validate these algorithms for real-world clinical utility.

show abstract

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Cited by 30 publications

References 59 publications

A Survey On Missing Data in Machine Learning

A Survey On Missing Data in Machine Learning

A survey on missing data in machine learning

What can machines learn about heart failure? A systematic literature review

Contact Info

Product

Resources

About