Sparsity in optimal randomized classification trees

Blanquero, Rafael; Carrizosa, Emilio; Molero-Río, Cristina; Morales, Dolores Romero

doi:10.1016/j.ejor.2019.12.002

Cited by 40 publications

(43 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed model takes into account the trade-off between accuracy and the simplicity of the chosen rules and is solved via a column generation method. (Blanquero et al 2018a;2018b) use a continuous optimization formulation to learn classification trees, where random decisions are made at internal nodes of the tree. Their approach is essentially a randomized optimal version of CART.…”

Section: Related Workmentioning

confidence: 99%

“…Hence, greedy based heuristics such as CART (Breiman et al 1984) and ID3 (Quinlan 1986) have been widely used to construct sub-optimal trees. Recent years have seen an increasing number of work that employ various Mathematical Optimization methods to build better quality decision trees, e.g., (Bennett and Blue 1996;Bessiere, Hebrard, and O'Sullivan 2009;Bertsimas and Dunn 2017;Silva 2017;Dash, Günlük, and Wei 2018;Blanquero et al 2018a;2018b;Firat et al 2018).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Optimal Classification Trees Using a Binary Linear Program Formulation

Verwer

Zhang

2019

AAAI

106

111

View full text Add to dashboard Cite

We provide a new formulation for the problem of learning the optimal classification tree of a given depth as a binary linear program. A limitation of previously proposed Mathematical Optimization formulations is that they create constraints and variables for every row in the training data. As a result, the running time of the existing Integer Linear programming (ILP) formulations increases dramatically with the size of data. In our new binary formulation, we aim to circumvent this problem by making the formulation size largely independent from the training data size. We show experimentally that our formulation achieves better performance than existing formulations on both small and large problem instances within shorter running time.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Learning Optimal Classification Trees Using a Binary Linear Program Formulation

Verwer

Zhang

2019

AAAI

106

111

View full text Add to dashboard Cite

show abstract

“…1 b. Many different approaches have been undertaken to implement more optimal DTs 9 , 11 – 14 , 16 , 18 , 19 . The computational complexity of training non-greedy DTs however grows exponentially with the number of nodes, as opposed to linearly for greedy ones.…”

Section: Methodsmentioning

confidence: 99%

“…The concern that GDTs are suboptimal was addressed long ago 9 . The problem of constructing a globally optimal DT is NP-hard 10 , Hence, various optimization techniques, relying on linear programming 9 , 11 , 12 , stochastic gradient descent 13 , mixed-integer formulation 14 , anytime induction 15 , randomization 16 , multilayer cascade structures 17 , column generation techniques 18 , and genetic algorithms 19 , have been proposed to solve this problem. All of these methods seek to strike a balance between accuracy, simplicity and efficiency.…”

Section: Introductionmentioning

confidence: 99%

Uncovering feature interdependencies in high-noise environments with stepwise lookahead decision forests

Donick¹,

Lera

2021

Sci Rep

View full text Add to dashboard Cite

Conventionally, random forests are built from “greedy” decision trees which each consider only one split at a time during their construction. The sub-optimality of greedy implementation has been well-known, yet mainstream adoption of more sophisticated tree building algorithms has been lacking. We examine under what circumstances an implementation of less greedy decision trees actually yields outperformance. To this end, a “stepwise lookahead” variation of the random forest algorithm is presented for its ability to better uncover binary feature interdependencies. In contrast to the greedy approach, the decision trees included in this random forest algorithm, each simultaneously consider three split nodes in tiers of depth two. It is demonstrated on synthetic data and financial price time series that the lookahead version significantly outperforms the greedy one when (a) certain non-linear relationships between feature-pairs are present and (b) if the signal-to-noise ratio is particularly low. A long-short trading strategy for copper futures is then backtested by training both greedy and stepwise lookahead random forests to predict the signs of daily price returns. The resulting superior performance of the lookahead algorithm is at least partially explained by the presence of “XOR-like” relationships between long-term and short-term technical indicators. More generally, across all examined datasets, when no such relationships between features are present, performance across random forests is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this lookahead variation is a useful extension to the toolkit of data scientists, in particular for financial machine learning, where conditions (a) and (b) are typically met.

show abstract

“…Recently, also integer linear programming has been employed to determine global optimal univariate and oblique decision trees of a prior specified maximum size [17,18]. Blanquero et al [19] develop a continuous optimization formulation instead to determine optimal randomized oblique decision trees. Additionally, penalty terms are introduced in the objective function to limit the number of overall involved attributes and the number of attributes per split to improve interpretability.…”

Section: Related Work and Contributionmentioning

confidence: 99%

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

Bollwein

Westphal

2021

Appl Intell

View full text Add to dashboard Cite

Univariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.

show abstract

Sparsity in optimal randomized classification trees

Cited by 40 publications

References 32 publications

Learning Optimal Classification Trees Using a Binary Linear Program Formulation

Learning Optimal Classification Trees Using a Binary Linear Program Formulation

Uncovering feature interdependencies in high-noise environments with stepwise lookahead decision forests

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

Contact Info

Product

Resources

About