2022
DOI: 10.48550/arxiv.2201.12682
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Geometry- and Accuracy-Preserving Random Forest Proximities

Abstract: Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise proximities can be computed from a trained random forest which measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 44 publications
1
3
0
Order By: Relevance
“…In previous work, we have shown that ensembles built using such splits do a better job of modeling the underlying patterns in data ( 25 ). This is also well supported by other work in the field ( 15 , 17 , 24 , 41 ). By using LANDMark, TreeOrdination also minimizes the impact of noisy features through randomization (bootstrapping of training data at each node, random selection of features, and models) and regularization (most models selected for splitting are L1 or L2 regularized) ( 21 , 25 ).…”
Section: Discussionsupporting
confidence: 86%
See 3 more Smart Citations
“…In previous work, we have shown that ensembles built using such splits do a better job of modeling the underlying patterns in data ( 25 ). This is also well supported by other work in the field ( 15 , 17 , 24 , 41 ). By using LANDMark, TreeOrdination also minimizes the impact of noisy features through randomization (bootstrapping of training data at each node, random selection of features, and models) and regularization (most models selected for splitting are L1 or L2 regularized) ( 21 , 25 ).…”
Section: Discussionsupporting
confidence: 86%
“…Unlike statistical models, machine learning models tend not to assume anything about the underlying distribution of each feature ( 4 , 5 ). Furthermore, some machine learning models, such as random forest (RF) and related classifiers, are capable of identifying dependencies between features without the need for the user to explicitly include these dependencies in the model ( 11 , 14 17 ). One ability, arguably underused, inherent to this class of models is that they can be used in an “unsupervised” manner to learn a dissimilarity function ( 15 , 18 , 19 ).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations