2015
DOI: 10.1016/j.dss.2015.05.002
|View full text |Cite
|
Sign up to set email alerts
|

Algorithm for the detection of outliers based on the theory of rough sets

Abstract: Outliers are objects that show abnormal behavior with respect to their context or that have unexpected values in some of their parameters. In decision-making processes, information quality is of the utmost importance. In specific applications, an outlying data element may represent an important deviation in a production process or a damaged sensor. Therefore, the ability to detect these elements could make the difference between making a correct or an incorrect decision. This task is complicated by the large s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(17 citation statements)
references
References 40 publications
0
17
0
Order By: Relevance
“…Several value combinations of r and m parameters were verified, and the best results using r and m proved to be a value of ten (10). Although (30,50 or 70) for r and m parameters translate into less outlier values, the time series' shape becomes stair-like, losing its similarity to the shape of the original times series (Figure 3, graph a). Thus, using these large window size values would be disadvantageous because an analysis of the last part of the time series (roughly values 30950 until 31000) becomes largely impossible, as observed in Figure 3 (graphs d, e and f).…”
Section: Resultsmentioning
confidence: 99%
“…Several value combinations of r and m parameters were verified, and the best results using r and m proved to be a value of ten (10). Although (30,50 or 70) for r and m parameters translate into less outlier values, the time series' shape becomes stair-like, losing its similarity to the shape of the original times series (Figure 3, graph a). Thus, using these large window size values would be disadvantageous because an analysis of the last part of the time series (roughly values 30950 until 31000) becomes largely impossible, as observed in Figure 3 (graphs d, e and f).…”
Section: Resultsmentioning
confidence: 99%
“…However, its computational implementation is complicated by its exponential order. An extension of the theoretical framework of the previous proposition is presented in [6], in which an outlier detection algorithm is implemented based on Pawlak rough sets-the Pawlak rough sets algorithm-with a nonexponential order of temporal and spatial complexity. In [6], a method for the detection of outliers has been proposed with a simple and rigorous theoretical setup, starting from a definition of outliers that is simple, intuitive, and computationally viable for large datasets.…”
Section: Introductionmentioning
confidence: 99%
“…An extension of the theoretical framework of the previous proposition is presented in [6], in which an outlier detection algorithm is implemented based on Pawlak rough sets-the Pawlak rough sets algorithm-with a nonexponential order of temporal and spatial complexity. In [6], a method for the detection of outliers has been proposed with a simple and rigorous theoretical setup, starting from a definition of outliers that is simple, intuitive, and computationally viable for large datasets. From this method, an efficient algorithm for outlier mining has been developed, conceptually based on a novel and original approach using rough set theory, which has not been applied in any previous category of classification for the methods of rough set detection.…”
Section: Introductionmentioning
confidence: 99%
“…Though it is very likely that there are two e-mails that have the similar words among the list of 10 words, however for which one is spam and the other is nonspam. In this case, the equivalence relation is not A universe partitioned by an equivalence relation and a concept ⊆ that cannot be defined using [41,42].…”
Section: Rough Set Theorymentioning
confidence: 99%