The number of Wireless sensor network (WSN) deployments have been growing so exponentially over the recent years. Due to their small size and cost-effective, WSN are attracting many industries to use them in various applications. Environmental monitoring, security of buildings and precision agriculture are few example among several other fields. However, WSN faces high security threats considering most of them are deployed in unattended nature and hostile environment. In the aim of providing secure data processing in the WSN, many techniques are proposed to protect the data privacy while being transferred from the sensors to the base station. This work is focusing on attack detection which is an essential task to secure the network and the data. Anomaly detection is a key challenge in order to ensure the security and prevent malicious attacks in wireless sensor networks. Various machine learning techniques have been used by researchers these days to detect anomalies using offline learning algorithms. On the other hand online learning classifiers have not been thoroughly addressed in the literature. Our aim is to provide an intrusion detection model compatible with the characteristics of WSN. This model is built based on information gain ratio and the online Passive aggressive classifier. Firstly, the information gain ratio is used to select the relevant features of the sensor data. Secondly, the online Passive aggressive algorithm is trained to detect and classify different type of Deny of Service attacks. The experiment was conducted on a wireless sensor network-detection system (WSN-DS) dataset. The proposed model ID-GOPA results detection rate of 96% determining whether the network is in its normal mode or exposed to any type of attack. The detection accuracy is 86%, 68%, 63%, and 46% for scheduling, grayhole, flooding and blackhole attacks, respectively, in addition to 99% for normal traffic. These results shows that our model based on offline learning can be providing good anomaly detection to the WSN and replace online learning in some cases.
With the blasting growth in data, uptake data mining techniques to mine association rules, and then find useful information hidden in large data has become ever more important. Several existing data mining techniques often through mining frequent itemsets draw association rules and get to relevant knowledge, but with the rapid arrival of the era of big data, traditional data mining algorithms have been impossible to meet large data's analysis needs. Lately, the PrePost algorithm has been suggested, a new algorithm for mining frequent itemsets based on the idea of N-lists. PrePost in most cases outperforms other present state-of-the-art algorithms. In mind of this, we present the HPrePostPlus algorithm. A better version of PrePost based on Hadoop, that utilization a HashMap to traverse effectively the PPC tree, and improve the process of creating the N-lists related with 1-itemsets. We combine also the characteristic of Hadoop with a view to process large data. Experience has demonstrated that HPrePostPlus algorithm is greater than the state-of-the-art methods in terms of performance and scalability.
Entity matching (EM), which is, the task of identifying records that refer to the same entity, is a critical task when constructing data warehouses. This task is often very expensive at the running time because data must be compared in pairs. This problem becomes more important when dealing with large-scale data. We propose a new parallel algorithm that divides the data using K-Medoid algorithm implemented with Spark framework. The computational experiments are done and show that we can improve the solution of a set of instances in a reduced execution time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.