Aryana Arsham scite author profile

Aryana Arsham

4Publications

8Citation Statements Received

104Citation Statements Given

How they've been cited

How they cite others

125

102

Affiliations

Computational Sciences (United States), Goucher College, University of Maryland, Baltimore County

Publications

Order By: Most citations

Effects of stopping criterion on the growth of trees in regression random forests

Arsham¹,

Rosenberg²,

Little³

2022

View full text Add to dashboard Cite

Random forests are a powerful machine learning tool that capture complex relationships between independent variables and an outcome of interest. Trees built in a random forest are dependent on several hyperparameters, one of the more critical being the node size. The original algorithm of Breiman, controls for node size by limiting the size of the parent node, so that a node cannot be split if it has less than a specified number of observations. We propose that this hyperparameter should instead be defined as the minimum number of observations in each terminal node. The two existing random forest approaches are compared in the regression context based on estimated generalization error, bias-squared, and variance of resulting predictions in a number of simulated datasets. Additionally the two approaches are applied to type 2 diabetes data obtained from the National Health and Nutrition Examination Survey. We have developed a straightforward method for incorporating weights into the random forest analysis of survey data. Our results demonstrate that generalization error under the proposed approach is competitive to that attained from the original random forest approach when data have large random error variability. The R code created from this work is available and includes an illustration.

show abstract

Alternative stopping rules to limit tree expansion for random forest models

Little

Rosenberg

Arsham

2022

Sci Rep

View full text Add to dashboard Cite

Random forests are a popular type of machine learning model, which are relatively robust to overfitting, unlike some other machine learning models, and adequately capture non-linear relationships between an outcome of interest and multiple independent variables. There are relatively few adjustable hyperparameters in the standard random forest models, among them the minimum size of the terminal nodes on each tree. The usual stopping rule, as proposed by Breiman, stops tree expansion by limiting the size of the parent nodes, so that a node cannot be split if it has less than a specified number of observations. Recently an alternative stopping criterion has been proposed, stopping tree expansion so that all terminal nodes have at least a minimum number of observations. The present paper proposes three generalisations of this idea, limiting the growth in regression random forests, based on the variance, range, or inter-centile range. The new approaches are applied to diabetes data obtained from the National Health and Nutrition Examination Survey and four other datasets (Tasmanian Abalone data, Boston Housing crime rate data, Los Angeles ozone concentration data, MIT servo data). Empirical analysis presented herein demonstrate that the new stopping rules yield competitive mean square prediction error to standard random forest models. In general, use of the intercentile range statistic to control tree expansion yields much less variation in mean square prediction error, and mean square prediction error is also closer to the optimal. The Fortran code developed is provided in the Supplementary Material.

show abstract

Summary of Radiation Research Society Online 66th Annual Meeting, Symposium on “Epidemiology: Updates on epidemiological low dose studies,” including discussion

Milder

Kendall

Arsham

et al. 2021

International Journal of Radiation Biology

View full text Add to dashboard Cite

A Bivariate Regression-Based Cost-Effectiveness Analysis

Arsham

Bebu

Mathew

2022

J Stat Theory Pract

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aryana Arsham

Effects of stopping criterion on the growth of trees in regression random forests

Alternative stopping rules to limit tree expansion for random forest models

Summary of Radiation Research Society Online 66th Annual Meeting, Symposium on “Epidemiology: Updates on epidemiological low dose studies,” including discussion

A Bivariate Regression-Based Cost-Effectiveness Analysis

Contact Info

Product

Resources

About