Data mining requires a pre-processing task in which the data are prepared, cleaned, integrated, transformed, reduced and discretized for ensuring the quality. Missing values is a universal problem in many research domains that is commonly encountered in the data cleaning process. Missing values usually occur when a value of stored data absent for a variable of an observation. Missing values problem imposes undesirable effect on analysis results, especially when it leads to biased parameter estimates. Data imputation is a common way to deal with missing values where the missing value's substitutes are discovered through statistical or machine learning techniques. Nevertheless, examining the strengths (and limitations) of these techniques is important to aid understanding its characteristics. In this paper, the performance of three machine learning classifiers (K-Nearest Neighbors (KNN), Decision Tree, and Bayesian Networks) are compared in terms of data imputation accuracy. The results shows that among the three classifiers, Bayesian has the most promising performance.
Asphalt cracks are one of the major road damage problems in civil field as it may potentially threaten the road and highway safety. Crack detection and classification is a challenging task because complicated pavement conditions due to the presence of shadows, oil stains and water spot will result in poor visual and low contrast between cracks and the surrounding pavement. In this paper, the network proposed a fully automated crack detection and classification using deep convolution neural network (DCNN) architecture. First, the image of pavement cracks manually prepared in RGB format with dimension of 1024x768 pixels, captured using NIKON digital camera. Next, the image will segmented into patches (32x32 pixels) as a training dataset from the original pavement cracks and trained DCNN with two different filter sizes: 3x3 and 5x5. The proposed method has successfully detected the presence of crack in the images with 98%, 99% and 99% of recall, precision and accuracy respectively. The network was also able to automatically classify the pavement cracks into no cracks, transverse, longitudinal and alligator with acceptable classification accuracy for both filter sizes. There was no significant different in classification accuracy between the two different filters. However, smaller filter size need more processing training time compared to the larger filter size. Overall, the proposed method has successfully achieved accuracy of 94.5% in classifying different types of crack.
Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends.
Clinicians could intervene during what may be a crucial stage for preventing permanent kidney injury if patients with incipient Acute Kidney Injury (AKI) and those at high risk of developing AKI could be identified. This paper proposes an improved mechanism to machine learning imputation algorithms by introducing the Particle Swarm Levy Flight algorithm. We improve the algorithms by modifying the Particle Swarm Optimization Algorithm (PSO), by enhancing the algorithm with levy flight (PSOLF). The creatinine dataset that we collected, including AKI diagnosis and staging, mortality at hospital discharge, and renal recovery, are tested and compared with other machine learning algorithms such as Genetic Algorithm and traditional PSO. The proposed algorithms' performances are validated with a statistical significance test. The results show that SVMPSOLF has better performance than the other method. This research could be useful as an important tool of prognostic capabilities for determining which patients are likely to suffer from AKI, potentially allowing clinicians to intervene before kidney damage manifests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.