Thabo Semong scite author profile

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

show abstract

A new evolutionary neural networks based on intrusion detection systems using locust swarm optimization

Benmessahel

Xie

Chellal

et al. 2019

Evol. Intel.

View full text Add to dashboard Cite

A Survey On Missing Data in Machine Learning

Emmanuel

Maupong

Mpoeleng

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

show abstract

Intelligent Load Balancing Techniques in Software Defined Networks: A Survey

et al. 2020

View full text Add to dashboard Cite

In the current technology driven era, the use of devices that connect to the internet has increased significantly. Consequently, there has been a significant increase in internet traffic. Some of the challenges that arise from the increased traffic include, but are not limited to, multiple clients on a single server (which can result in denial of service (DoS)), difficulty in network scalability, and poor service availability. One of the solutions proposed in literature, to mitigate these, is the use of multiple servers with a load balancer. Despite their common use, load balancers, have shown to have some disadvantages, like being vendor specific and non-programmable. To address these disadvantages and improve internet traffic, there has been a paradigm shift which resulted in the introduction of software defined networking (SDN). SDN allows for load balancers that are programmable and provides the flexibility for one to design and implement own load balancing strategies. In this survey, we highlight the key elements of SDN and OpenFlow technology and their effect on load balancing. We provide an overview of the various load balancing schemes in SDN. The overview is based on research challenges, existing solutions, and we give possible future research directions. A summary of emulators/mathematical tools commonly used in the design of intelligent load balancing SDN algorithms is provided. Finally, we outline the performance metrics used to evaluate the algorithms.

show abstract

Multi-Source Multicast Routing with QoS Constraints in Network Function Virtualization

Xie

Zhou

Semong

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thabo Semong

A survey on missing data in machine learning

A new evolutionary neural networks based on intrusion detection systems using locust swarm optimization

A Survey On Missing Data in Machine Learning

Intelligent Load Balancing Techniques in Software Defined Networks: A Survey

Multi-Source Multicast Routing with QoS Constraints in Network Function Virtualization

Contact Info

Product

Resources

About