Missing data are a universal data quality problem in many domains, leading to misleading analysis and inaccurate decisions. Much research has been done to investigate the different mechanisms of missing data and the proper techniques in handling various data types. In the last decade, machine learning has been utilized to replace conventional methods to address the problem of missing values more efficiently. By studying and analyzing recently proposed methods using machine learning approaches, vital adoptions in accuracy, performance, and time consumed can be highlighted. This study aimed to help data analysts and researchers address the limitations of machine learning imputation methods by conducting a systematic literature review to provide a comprehensive overview of using such methods to impute missing values. Novel proposed machine learning approaches used for data imputation are analyzed and summarized to assist researchers in selecting a proper machine learning method based on several factors and settings. The review was performed on research studies published between 2016 and 2021 on adopting machine learning to impute missing values, focusing on their strengths and limitations. A total of 684 research articles from various scientific databases were analyzed using search engines, and 94 of them were selected as primary studies. Finally, several recommendations were given to guide future researchers in applying machine learning to impute missing values.INDEX TERMS Systematic literature review, data imputation, data mining, missingness, data preprocessing, data quality.
Malware attack cases continue to rise in our current day. The Trojan attack, which may be extremely destructive by unlawfully controlling other users' computers in order to steal their data. As a result, Trojan horse detection is essential to identify the Trojan and limit Trojan attacks. In this study, we proposed a Trojan detection system that employed machine learning algorithms to detect Trojan horses within the system. A public dataset of Trojan horses that contain 2001 samples comprises of 1041 Trojan horses and 960 of benign is used to train the machine learning classification. In this paper, the Trojan detection system is trained using four types of classifiers which are Random Forest, J48, Decision Table and Naïve Bayes. WEKA is used for the execution of the classification process and performance analysis. The results indicated that the detection system trained with the Random Forest and Decision Table algorithms obtained the maximum level of accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.