Better Short than Greedy: Interpretable Models through Optimal Rule Boosting

Boley, Mario; Teshuva, Simon; Bodic, Pierre Le; Webb, Geoffrey I.

doi:10.1137/1.9781611976700.40

Cited by 8 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach was pioneered by algorithms such as Lri (Weiss and Indurkhya, 2000) and SLipper (Cohen and Singer, 1999), the general framework of gradient boosting for rule learning was most clearly defined in ender (Dembczyński et al, 2010). Recent additions to this family include Boomer (Rapp et al, 2020), which generalizes this approach to learning multi-label rules, and the algorithm of Boley et al (2021), which replaced the greedy search for the best addition to the rule set with an efficient exhaustive search.…”

Section: Covering Algorithmsmentioning

confidence: 99%

Efficient learning of large sets of locally optimal classification rules

2023

View full text Add to dashboard Cite

Conventional rule learning algorithms aim at finding a set of simple rules, where each rule covers as many examples as possible. In this paper, we argue that the rules found in this way may not be the optimal explanations for each of the examples they cover. Instead, we propose an efficient algorithm that aims at finding the best rule covering each training example in a greedy optimization consisting of one specialization and one generalization loop. These locally optimal rules are collected and then filtered for a final rule set, which is much larger than the sets learned by conventional rule learning algorithms. A new example is classified by selecting the best among the rules that cover this example. In our experiments on small to very large datasets, the approach’s average classification accuracy is higher than that of state-of-the-art rule learning algorithms. Moreover, the algorithm is highly efficient and can inherently be processed in parallel without affecting the learned rule set and so the classification accuracy. We thus believe that it closes an important gap for large-scale classification rule induction.

show abstract

Section: Covering Algorithmsmentioning

confidence: 99%

Efficient learning of large sets of locally optimal classification rules

2023

View full text Add to dashboard Cite

show abstract

“…In this paper, we investigate how to extend these ideas to the multi-label classiőcation setting. The problem of controlling the number of rules has also been studied for single-label rule boosting, where learned rules are combined additively [15]. An extension to multi-label classiőcation represents a possible direction of future work.…”

Section: Related Workmentioning

confidence: 99%

“…A similar issue arises in head sampling. The weight function in (15) grows exponentially with |D + |, so that cftp most likely returns the positive data records with the highest number of present features. Therefore, sampled heads tend to be very long and have small support (often 1).…”

Section: Limitations Of the Two-stage Pattern-sampling Frameworkmentioning

confidence: 99%

Concise and interpretable multi-label rule sets

Ciaperoni

Han

Gionis

2023

Preprint

View full text Add to dashboard Cite

Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple “if-then” rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules, requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem we propose a 2-approximation algorithm, which circumvents the exponential-size search space of rules using a novel technique to sample highly discriminative and diverse rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation, which indicates that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.

show abstract

“…Nonetheless, the main limitation of these approaches is that they are based on a heuristic definition of a rule-based model, i.e., they add rules without a global optimal criteria. Over the past years, rule learning methods that go beyond greedy approaches have been developed, i.e., Monte-Carlo search for Bayesian rule lists (Letham et al 2015;Yang et al 2017), and branch-and-bound with tight bounds for decision lists (Angelino et al 2017) and rule sets (Boley et al 2021). However, the main limitation of these methods is that they can only be applied to small or mid-size datasets and are mostly limited to binary targets.…”

Section: Rule-based Classifiersmentioning

confidence: 99%

“…Another interesting and straightforward development would be the extension of our work to mixed targets, combining nominal numeric variables. second developments could go from upper-and-lower bounds to improvements in search methods and to study the feasibility of global search such as Markov Chain Monte Carlo methods used by Yang et al (2017) or branch-and-bound algorithms used by Boley et al (2021). In the third category, our approach could be formalised for subgroup sets, allowing for overlap between the subgroups.…”

Section: In Pattern Miningmentioning

confidence: 99%

Robust subgroup discovery

Proença

Bäck

Leeuwen

2022

Data Min Knowl Disc

View full text Add to dashboard Cite

We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, including traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, finding optimal subgroup lists is NP-hard. Therefore, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration. In fact, the greedy gain is shown to be equivalent to a Bayesian one-sample proportion, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. Furthermore, we empirically show on 54 datasets that SSD++ outperforms previous subgroup discovery methods in terms of quality, generalisation on unseen data, and subgroup list size.

show abstract

Better Short than Greedy: Interpretable Models through Optimal Rule Boosting

Cited by 8 publications

References 0 publications

Efficient learning of large sets of locally optimal classification rules

Efficient learning of large sets of locally optimal classification rules

Concise and interpretable multi-label rule sets

Robust subgroup discovery

Contact Info

Product

Resources

About