D C Vinutha scite author profile

et al. 2021

Bulletin EEI

Irrelevant feature in heart disease dataset affects the performance of binary classification model. Consequently, eliminating irrelevant and redundant feature (s) from training set with feature selection algorithm significantly improves the performance of classification model on heart disease detection. Sequential feature selection (SFS) is successful algorithm to improve the performance of classification model on heart disease detection and reduces the computational time complexity. In this study, sequential feature selection (SFS) algorithm is implemented for improving the classifier performance on heart disease detection by removing irrelevant features and training a model on optimal features. Furthermore, exhaustive and permutation based feature selection algorithm are implemented and compared with SFS algorithm. The implemented and existing feature selection algorithms are evaluated using real world Pima Indian heart disease dataset and result appears to prove that the SFS algorithm outperforms as compared to exhaustive and permutation based feature selection algorithm. Overall, the result looks promising and more effective heart disease detection model is developed with accuracy of 99.3%.

An Accurate and Efficient Scheduler for Hadoop MapReduce Framework

Raju

2018

IJEECS

MapReduce is the preferred computing framework used in large data analysis and processing applications. Hadoop is a widely used MapReduce framework across different community due to its open source nature. Cloud service provider such as Microsoft azure HDInsight offers resources to its customer and only pays for their use. However, the critical challenges of cloud service provider is to meet user task Service level agreement (SLA) requirement (task deadline). Currently, the onus is on client to compute the amount of resource required to run a job on cloud. This work present a novel makespan model for Hadoop MapReduce framework namely OHMR (Optimized Hadoop MapReduce) to process data in real-time and utilize system resource efficiently. The OHMR present accurate model to compute job makespan time and also present a model to provision the amount of cloud resource required to meet task deadline. The OHMR first build a profile for each job and computes makespan time of job using greedy approach. Furthermore, to provision amount of resource required to meet task deadline Lagrange Multipliers technique is applied. Experiment are conducted on Microsoft Azure HDInsight cloud platform considering different application such as text computing and bioinformatics application to evaluate performance of OHMR of over existing model shows significant performance improvement in terms of computation time. Experiment are conducted on Microsoft Azure HDInsight cloud. Overall good correlation is reported between practical makespan values and theoretical makespan values.

Association Rule Data Mining in Agriculture – A Review

Vignesh¹,

Vinutha²

2020

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

Raju

2021

SN COMPUT. SCI.

Over the past few years, data production has increased significantly due to the growth of Internet-dependent technologies. Big data allows for an evolving paradigm change in data discovery and use. Big data is processed using MapReduce framework in a scalable and distributed manner. For performance improvement of job scheduling across the nodes in Hadoop cluster is an optimization problem. Scheduling algorithm is proposed to optimize the MapReduce jobs by reducing the budget and execution time of cloud models. Experiments for the proposed method have been carried out on word count and sessionization application of web server log file with different size. Experimental results show that the average completion time is reduced in the proposed method when compared to FIFO and fair scheduler.

In-Memory Cache and Intra-Node Combiner Approaches for Optimizing Execution Time in High-Performance Computing

Raju

2020

SN COMPUT. SCI.

Hadoop is a leading framework for processing and analyzing big data. MapReduce is a programming model for processing data in parallel and distributed computing environment. One of the issues in HDFS is the number of disk I/O operations involved in fetching input data blocks to memory from disk for every task. Since data-intensive tasks expect predominant file access operations, In-Memory Cache approach is proposed in this paper so as to reduce the number of disk I/O operations which in turn results in reduced execution time and increased throughput. The intermediate data from mapper having a large number of duplicate pairs are to be delivered across the network to the reducers. This may cause increase in network traffic and requires high bandwidth. Therefore, Intra-Node Combiner approach is proposed in this paper which aggregates the duplicate pairs present in the intermediate data in every node before transferring to the reducers. This may result in reduced network traffic and bandwidth. Experimental analysis on the proposed approaches has been carried out on Click count and Sessionization applications of a Web server log file with varied size. The experimental results reveal that the average execution time is decreased by 23% in respect of In-Memory Cache approach.