2019
DOI: 10.32604/cmc.2019.03708
|View full text |Cite
|
Sign up to set email alerts
|

Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection

Abstract: The extreme imbalanced data problem is the core issue in anomaly detection. The amount of abnormal data is so small that we cannot get adequate information to analyze it. The mainstream methods focus on taking fully advantages of the normal data, of which the discrimination method is that the data not belonging to normal data distribution is the anomaly. From the view of data science, we concentrate on the abnormal data and generate artificial abnormal samples by machine learning method. In this kind of techno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
22
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 53 publications
(22 citation statements)
references
References 13 publications
0
22
0
Order By: Relevance
“…Based on several studies, we found that a commonly used dataset for health data mining was the Pima Indians Diabetes Dataset from the University of California, Irvine (UCI) Machine Learning Database [24][25][26][27][28][29]. The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on several studies, we found that a commonly used dataset for health data mining was the Pima Indians Diabetes Dataset from the University of California, Irvine (UCI) Machine Learning Database [24][25][26][27][28][29]. The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome.…”
Section: Methodsmentioning
confidence: 99%
“…In order to check the performance of the upgraded network has been processedt the experimental dataset of [23,24], representing a good dataset for testing LSTM neural network. The experimental dataset [24] has been adopted in the literature for different data mining testing [24][25][26][27][28][29]. Specifically in reference [25], the K-means algorithm has been applied for predicting diabetes, in reference [26] some authors applied synthetic data in order to balance a machine learning dataset model, while references [27][28][29] have analyzed different machine learning algorithms for diabetes prediction.…”
Section: Introductionmentioning
confidence: 99%
“…Thus batch learning consumes lots of time and space resources, resulting in low efficiency. Besides, in many real-world situations, such as anomaly detection [1] and stock forecasting [2], the data is growing rapidly and evolving. Sometimes the model needs to be trained without waiting for all the data collected.…”
Section: Introductionmentioning
confidence: 99%
“…This has made the application range of wireless sensor networks more widely expanded, including smart home, intelligent agriculture, and other fields [21][22][23][24][25]. With the development, sensing device has also developed rapidly; the current Internet is experiencing a trend from centralization to marginalization, Cloud computing [26,27], Edge computing [28,29], and Fog computer [30][31][32] which correspond to the new computational model proposed for such development [26,28,[33][34][35]. With the rapid rise of artificial intelligence technology [36,37], the combination of artificial intelligence and Internet of Things (IoT) has made it a longer development [38][39][40], which has become the focus of researchers.…”
Section: Introductionmentioning
confidence: 99%