Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. Examining these collected data to detect suspicious events can reduce functional threats and avoid unseen issues that cause downtime in the applications. Due to the dynamic nature of the data stream characteristics, many unresolved problems persist. In the existing literature, methods have been designed and developed to evaluate certain anomalous behaviors in IoT data stream sources. However, there is a lack of comprehensive studies that discuss all the aspects of IoT data processing. Thus, this paper attempts to fill this gap by providing a complete image of various state-of-the-art techniques on the major problems and core challenges in IoT data. The nature of data, anomaly types, learning mode, window model, datasets, and evaluation criteria are also presented. Research challenges related to data evolving, feature-evolving, windowing, ensemble approaches, nature of input data, data complexity and noise, parameters selection, data visualizations, heterogeneity of data, accuracy, and large-scale and high-dimensional data are investigated. Finally, the challenges that require substantial research efforts and future directions are summarized.
The world is currently progressing towards a new connectivity era where billions of sensors are connected over a network called the Internet of Things (IoT). IoT enables a wide range of physical objects and devices to be connected and monitored with insufficient spatial and temporal detail. Despite their potential to improve multiple application domains, anomalies in the devices' behaviors pose a significant challenge, especially in the smart city's domain. Many research works have been devoted to determining such anomalous behaviors; however, there is a lack of comprehensive review focusing on anomaly detection techniques using statistical and machine learning methods in the smart cities domain. This work aims to fill this gap by presenting a review of anomaly detection techniques using statistical and machine learning methods. This paper explains the essential contexts related to IoT, followed by a review of the IoT anomaly detection techniques and their challenges, types, and detection modes. The paper then presents a summary of the related works related to smart cities. Finally, the open challenges and future directions were highlighted.
As applications generate massive amounts of data streams, the requirement for ways to analyze and cluster this data has become a critical field of research for knowledge discovery. Data stream clustering’s primary objective and goal are to acquire insights into incoming data. Recognizing all possible patterns in data streams that enter at variable rates and structures and evolve over time is critical for acquiring insights. Analyzing the data stream has been one of the vital research areas due to the inevitable evolving aspect of the data stream and its vast application domains. Existing algorithms for handling data stream clustering consider adding various data summarization structures starting from grid projection and ending with buffers of Core-Micro and Macro clusters. However, it is found that the static assumption of the data summarization impacts the quality of clustering. To fill this gap, an online clustering algorithm for handling evolving data streams using a tempo-spatial hyper cube called BOCEDS TSHC has been developed in this research. The role of the tempo-spatial hyper cube (TSHC) is to add more dimensions to the data summarization for more degree of freedom. TSHC when added to Buffer-based Online Clustering for Evolving Data Stream (BOCEDS) results in a superior evolving data stream clustering algorithm. Evaluation based on both the real world and synthetic datasets has proven the superiority of the developed BOCEDS TSHC clustering algorithm over the baseline algorithms with respect to most of the clustering metrics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.