With the development of information technology, a large volume of data is growing and getting stored electronically. Thus, the data volumes processing by many applications will routinely cross the petabyte threshold range, in that case it would increase the computational requirements. Efficient processing algorithms and implementation techniques are the key in meeting the scalability and performance requirements in such scientific data analyses. So for the same here, it has been analyzed with the various MapReduce Programs and a parallel clustering algorithm (PKMeans) on Hadoop cluster, using the Concept of MapReduce. Here, in this experiment we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Means Clustering Algorithm. It has been found that as the number of nodes increases the execution time decreases, but also some of the interesting cases has been found during the experiment and recorded the various performance change and drawn different performance graphs. This experiment is basically a research study of above MapReduce applications and also to verify and validate the MapReduce Program model for Parallel KMeans algorithm on Hadoop Cluster having four nodes.
In the remote health care monitoring applications, the collected medical data from bio-medical sensors should be transmitted to the nearest gateway for further processing. Transmission of data contributes to a significant amount of power consumption by the transmitter and increase in the network traffic. In this paper we propose a low complex rule engine based health care data acquisition and smart transmission system architecture, which uses IEEE 802.15.4 standard for transferring data to the gateway. The power consumed and the network traffic generated by the device can be reduced by event based transmission rather than continuous transmission of data. We developed two different rule engines: static rule engine and adaptive rule engine, which decides whether to transmit the collected data based on the important features extracted from the data, thereby achieving power saving. In this paper, ECG data acquisition and transmission architecture is considered. The metrics used for performance analysis are the amount of power saving and reduction in network traffic. It is shown that the proposed rule engine gives a significant reduction in energy consumption and network traffic generated.
From the recent years the large volume of data is growing bigger and bigger. It is difficult to measure the total volume of structured and unstructured data that require machine-based systems and technologies in order to be fully analyzed. Efficient implementation techniques are the key to meeting the scalability and performance requirements entailed in such scientific data analysis. So for the same in this paper the Sequential Support Vector Machine in WEKA and various MapReduce Programs including Parallel Support Vector Machine on Hadoop cluster is analyzed and thus, in this way Algorithms are Verified and Validated on Hadoop Cluster using the Concept of MapReduce. In this paper, the performance of above applications has been shown with respect to execution time/training time and number of nodes. Experimental Results shows that as the number of nodes increases the execution time decreases. This experiment is basically a research study of above MapReduce applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.