2020
DOI: 10.1007/s10618-020-00703-x
|View full text |Cite
|
Sign up to set email alerts
|

For real: a thorough look at numeric attributes in subgroup discovery

Abstract: Subgroup discovery (SD) is an exploratory pattern mining paradigm that comes into its own when dealing with large real-world data, which typically involves many attributes, of a mixture of data types. Essential is the ability to deal with numeric attributes, whether they concern the target (a regression setting) or the description attributes (by which subgroups are identified). Various specific algorithms have been proposed in the literature for both cases, but a systematic review of the available options is m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 44 publications
0
13
0
Order By: Relevance
“…Before discussing predictive analytic SISSO model for zT, we analyze qualitatively all measured properties contributing to thermoelectric performance using a data-mining approach subgroup discovery (SGD). [30][31][32][33] SGD nds statistically exceptional subgroups in a dataset described by statements (selectors) of the kind (feature 1 < ) AND (feature 2 > ) AND ... . The features include only temperature and experimentally known materials properties listed in Table 1 and Table S1.…”
Section: Resultsmentioning
confidence: 99%
“…Before discussing predictive analytic SISSO model for zT, we analyze qualitatively all measured properties contributing to thermoelectric performance using a data-mining approach subgroup discovery (SGD). [30][31][32][33] SGD nds statistically exceptional subgroups in a dataset described by statements (selectors) of the kind (feature 1 < ) AND (feature 2 > ) AND ... . The features include only temperature and experimentally known materials properties listed in Table 1 and Table S1.…”
Section: Resultsmentioning
confidence: 99%
“…For the quality measure it uses the Weighted Kullback-Leibler without dispersion, i.e., W KL µ (s) = n s /σ d ( μd − μs ) 2 as described in Appendix 3, as the algorithm does not accept its dispersion-aware version used in Eq. (16).…”
Section: Methodsmentioning
confidence: 99%
“…The first reason for using greedy search to add one subgroup at the time, is its transparency, as it adds at each iteration the locally best subgroup found by the beam search. Beamsearch, on the other hand, was empirically shown, in the context of subgroup discovery for numeric targets, to be very competitive in terms of quality when compared to a complete search with an associated speedup improvement [16]. Also, its straightforward implementation allows to easily extend this framework to other types of targets, not just numeric.…”
Section: The Ssd++ Algorithmmentioning
confidence: 99%
“…The first reason for using greedy search to add one subgroup at the time, is its transparency, as it adds at each iteration the locally best subgroup found by the beam search. Beam-search, on the other hand, was empirically shown, in the context of subgroup discovery for numeric targets, to be very competitive in terms of quality when compared to a complete search with an associated speedup improvement [14]. Also, its straightforward implementation allows to easily extend this framework to other types of targets, not just numeric.…”
Section: The Ssd++ Algorithmmentioning
confidence: 99%