Summary
In recent years, wireless sensor networks are pervasive and are generating tons of data every second. Performing outlier detection to detect faulty sensors from such a large amount of data becomes a challenging task. Most of the existing techniques for outlier detection in wireless sensor networks concentrate only on contents of the data source without considering correlation among different data attributes. Moreover, these methods are not scalable to big data. To address these 2 limitations, this paper proposes an outlier detection approach based on correlation and dynamic SMO (sequential minimal optimization) regression that is scalable to big data. Initially, correlation is used to find out strongly correlated attributes and then the point anomalous nodes are detected using dynamic SMO regression. For fast processing of big data, Hadoop MapReduce framework is used. The experimental analysis demonstrates that the proposed approach efficiently detects the point and contextual anomalies and reduces the number of false alarms. For experiments, real data of sensors used in body sensor networks are taken from Physionet database.
The world is already into the information age. The huge growth of digital data has overwhelmed the traditional systems and approaches. Big data is touching almost all aspects of our life and the data-driven discovery approach is an emerging paradigm for computing. The ever-growing data provides a tidal wave of opportunities and challenges in terms of data capture, storage, manipulation, management, analysis, knowledge extraction, security, privacy and visualisation. Though the promise of big data seems to be genuine, still a wide gap exists between its potential and realisation. In last few years, there is a huge surge in research efforts in academia as well as industry to have a better understanding of big data. This article discusses the following: (1) big data evolution including a bibliometric study of academic and industry publications pertaining to big data during the period 2000–2017, (2) popular open-source big data stream processing frameworks and (3) prevalent research challenges which must be addressed to realise the true potential of big data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.