2019
DOI: 10.3390/s19204536
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Sensor Data Pre-Processing Methodology for the Internet of Things Using Anomaly Detection and Transfer-By-Subspace-Similarity Transformation

Abstract: The Internet of Things (IoT) and sensors are becoming increasingly popular, especially in monitoring large and ambient environments. Applications that embrace IoT and sensors often require mining the data feeds that are collected at frequent intervals for intelligence. Despite the fact that such sensor data are massive, most of the data contents are identical and repetitive; for example, human traffic in a park at night. Most of the traditional classification algorithms were originally formulated decades ago, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…In [14], it was recommended fresh IDS by utilizing Transfer by Subspace Similarity classification algorithm that improves the gathered data quality by computing the relevance correlations between variable and filling the missing values in the dataset. Assessment conducted on NSL-KDD database demonstrated that the method is highly successful compared with that which is reliant on Transfer by Subspace Similarity technique.…”
Section: Related Workmentioning
confidence: 99%
“…In [14], it was recommended fresh IDS by utilizing Transfer by Subspace Similarity classification algorithm that improves the gathered data quality by computing the relevance correlations between variable and filling the missing values in the dataset. Assessment conducted on NSL-KDD database demonstrated that the method is highly successful compared with that which is reliant on Transfer by Subspace Similarity technique.…”
Section: Related Workmentioning
confidence: 99%
“…In general, vast amounts of sensing data from industrial IoT deployed in poor environments are frequently generated by repetitive and irregular data owing to various transmission errors, sensor malfunctions, and cases of external interference, which can eventually lead to poor performance due to incorrect predictions being made in case-based reasoning [ 19 , 20 ]. Therefore, we propose the use of data collection, data preprocessing, and ANN to effectively improve prediction performance through the accurate pattern learning of sensed data [ 21 ].…”
Section: Data Miningmentioning
confidence: 99%
“…Choose a state from the state set as the initial state S(t); while t is less than the user's termination iterations do Choose an action from the action set {G, H, I} by e-greedy strategy according to the definition of action in Part C of Section 3; Execute the selected action on the current state S(t) to jump to the next state S(t + 1); Perform crossover operation with global variable on the features contained in state S(t + 1); Calculate the fitness of the multidimensional data discretization scheme after crossover operation using equation 5; Measure the corresponding reward using equation (11) according to the definition of reward in Part C of Section 3; Update crossover Q-Table using equation 6; if the fitness of the multidimensional data discretization scheme > local variable do Update local variable with the fitness of the multidimensional data discretization scheme; end Perform crossover operation in P(t); Calculate the fitness of each individual in P(t) using equation 8; Update global variable with the optimal individual fitness value in P(t); S(t) � S(t + 1); Choose an action from the action set {G, H, I} by e-greedy strategy according to the definition of action in Part C of Section 3; Execute the selected action on the current state S(t) to jump to the next state S(t + 1); Perform mutation operation on the features contained in state S(t + 1) Calculate the fitness of the multidimensional data discretization scheme after mutation operation using equation 5; Measure the corresponding reward using equation (11) according to the definition of reward in Part C of Section 3; Update mutation Q-Table using equation (6); if the fitness of the multidimensional data discretization scheme > local variable do Update local variable with the fitness of the multidimensional data discretization scheme; end Perform mutation operation in P(t); Calculate the fitness of each individual in P(t) using equation (8); Update global variable with the optimal individual fitness value in P(t); t � t + 1; end Return Max(global variable, local variable); end ALGORITHM 2: RLGA algorithm process.…”
Section: Configuration Of Experimentalmentioning
confidence: 99%
“…ese are mainly from various types of sensors, with high-dimensional, incomplete, random, fuzzy, and strong interference and other characteristics [6]. Despite the growing body of artificial intelligence research, how to extract and analyze valuable information from these massive amounts of complex sensor data is still a huge challenge in the field of artificial intelligence [7][8][9]. As one of the most influential data preprocessing technologies, feature discretization can reduce the complexity of data by transforming the continuous features in massive data to discrete features and obtain shorter, more accurate, and more comprehensible rules, so as to improve the efficiency of data mining and machine learning [10][11][12][13][14][15][16].…”
Section: Introductionmentioning
confidence: 99%