Irrelevant feature in heart disease dataset affects the performance of binary classification model. Consequently, eliminating irrelevant and redundant feature (s) from training set with feature selection algorithm significantly improves the performance of classification model on heart disease detection. Sequential feature selection (SFS) is successful algorithm to improve the performance of classification model on heart disease detection and reduces the computational time complexity. In this study, sequential feature selection (SFS) algorithm is implemented for improving the classifier performance on heart disease detection by removing irrelevant features and training a model on optimal features. Furthermore, exhaustive and permutation based feature selection algorithm are implemented and compared with SFS algorithm. The implemented and existing feature selection algorithms are evaluated using real world Pima Indian heart disease dataset and result appears to prove that the SFS algorithm outperforms as compared to exhaustive and permutation based feature selection algorithm. Overall, the result looks promising and more effective heart disease detection model is developed with accuracy of 99.3%.
MapReduce is the preferred computing framework used in large data analysis and processing applications. Hadoop is a widely used MapReduce framework across different community due to its open source nature. Cloud service provider such as Microsoft azure HDInsight offers resources to its customer and only pays for their use. However, the critical challenges of cloud service provider is to meet user task Service level agreement (SLA) requirement (task deadline). Currently, the onus is on client to compute the amount of resource required to run a job on cloud. This work present a novel makespan model for Hadoop MapReduce framework namely OHMR (Optimized Hadoop MapReduce) to process data in real-time and utilize system resource efficiently. The OHMR present accurate model to compute job makespan time and also present a model to provision the amount of cloud resource required to meet task deadline. The OHMR first build a profile for each job and computes makespan time of job using greedy approach. Furthermore, to provision amount of resource required to meet task deadline Lagrange Multipliers technique is applied. Experiment are conducted on Microsoft Azure HDInsight cloud platform considering different application such as text computing and bioinformatics application to evaluate performance of OHMR of over existing model shows significant performance improvement in terms of computation time. Experiment are conducted on Microsoft Azure HDInsight cloud. Overall good correlation is reported between practical makespan values and theoretical makespan values.
Over the past few years, data production has increased significantly due to the growth of Internet-dependent technologies. Big data allows for an evolving paradigm change in data discovery and use. Big data is processed using MapReduce framework in a scalable and distributed manner. For performance improvement of job scheduling across the nodes in Hadoop cluster is an optimization problem. Scheduling algorithm is proposed to optimize the MapReduce jobs by reducing the budget and execution time of cloud models. Experiments for the proposed method have been carried out on word count and sessionization application of web server log file with different size. Experimental results show that the average completion time is reduced in the proposed method when compared to FIFO and fair scheduler.
Hadoop is a leading framework for processing and analyzing big data. MapReduce is a programming model for processing data in parallel and distributed computing environment. One of the issues in HDFS is the number of disk I/O operations involved in fetching input data blocks to memory from disk for every task. Since data-intensive tasks expect predominant file access operations, In-Memory Cache approach is proposed in this paper so as to reduce the number of disk I/O operations which in turn results in reduced execution time and increased throughput. The intermediate data from mapper having a large number of duplicate
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.