Untitled

Jensen, David; Cohen, Paul R.

doi:10.1023/a:1007631014630

Cited by 165 publications

(17 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Classification and Regression Tree (CART) algorithm is the most widely used algorithm to construct a Random Forest. Some studies, however, recognized a bias, with respect to variable selection, toward variables with different scales and many possible splits within the CART algorithm [40][41][42][43][44][45]. Hence, the Conditional Inference Tree (CIT) algorithm was developed to overcome this bias and improve the interpretability of the trees [46].…”

Section: Random Forestsmentioning

confidence: 99%

Assessment of Business Interruption of Flood-Affected Companies Using Random Forests

Sultana

Sieg

Kellermann

et al. 2018

Water

View full text Add to dashboard Cite

Losses due to floods have dramatically increased over the past decades, and losses of companies, comprising direct and indirect losses, have a large share of the total economic losses. Thus, there is an urgent need to gain more quantitative knowledge about flood losses, particularly losses caused by business interruption, in order to mitigate the economic loss of companies. However, business interruption caused by floods is rarely assessed because of a lack of sufficiently detailed data. A survey was undertaken to explore processes influencing business interruption, which collected information on 557 companies affected by the severe flood in June 2013 in Germany. Based on this data set, the study aims to assess the business interruption of directly affected companies by means of a Random Forests model. Variables that influence the duration and costs of business interruption were identified by the variable importance measures of Random Forests. Additionally, Random Forest-based models were developed and tested for their capacity to estimate business interruption duration and associated costs. The water level was found to be the most important variable influencing the duration of business interruption. Other important variables, relating to the estimation of business interruption duration, are the warning time, perceived danger of flood recurrence and inundation duration. In contrast, the amount of business interruption costs is strongly influenced by the size of the company, as assessed by the number of employees, emergency measures undertaken by the company and the fraction of customers within a 50 km radius. These results provide useful information and methods for companies to mitigate their losses from business interruption. However, the heterogeneity of companies is relatively high, and sector-specific analyses were not possible due to the small sample size. Therefore, further sector-specific analyses on the basis of more flood loss data of companies are recommended.

show abstract

Section: Random Forestsmentioning

confidence: 99%

Assessment of Business Interruption of Flood-Affected Companies Using Random Forests

Sultana

Sieg

Kellermann

et al. 2018

Water

View full text Add to dashboard Cite

show abstract

“…An example of this would be to determine at first the relevant inputs to use with the forecasting methods, such as conducting feature selections using a wrapper approach [26]. These methods may have prohibitive computational cost when working with the full datasets, while increasing the risk of oversearching the space of forecasting methods [27]. Working on smaller but representative subsets for hyperparameter tuning or feature selection allows the computation time to be reduced, while optimizing over only a small part of the full training set, keeping the rest of the training set untouched for the final training.…”

Section: Resultsmentioning

confidence: 99%

Training subset selection in Hourly Ontario Energy Price forecasting using time series clustering-based stratification

López

Gagné

Castellanos-Domínguez

et al. 2015

Neurocomputing

View full text Add to dashboard Cite

Training a given learning-based forecasting method to a satisfactory level of performance often requires a large dataset. Indeed, any data-driven methods require having examples that are providing a satisfactory representation of what we want to model to work properly. This often implies using large datasets to be sure that the phenomenon of interest is properly sampled. However, learning from time series composed of too many samples can also be a problem, given that the computational requirements of the learning algorithms can easily grow following a polynomial complexity according to the training set size. In order to identify representative examples of a dataset, we are proposing a methodology using clustering-based stratification of time series to select a training data subset. The principle for constructing a representative sample set using this method consists in selecting heterogeneous instances picked from all the various clusters composing the dataset. Results obtained show that with a small number of training examples, obtained through the proposed clustering-based stratification, we can preserve the performance and improve the stability of models such as artificial neural networks and support vector regression, while training at a much lower computational cost. We illustrate the methodology through forecasting the one-step ahead Hourly Ontario Energy Price (HOEP).

show abstract

“…Besides the Bonferroni correction, different cross-validation methods are implemented in the semtree package. Cross-validation separates the estimation of SEMs from the testing of a potential cut point (e.g., Jensen and Cohen, 2000). SEM trees can be grown with a two-stage approach (Loh and Shih, 1997;Shih, 2004;Brandmaier et al, 2013b) that splits the sample associated with a node in half.…”

Section: Structural Equation Model Treesmentioning

confidence: 99%

“…Another problem of the current semtree package is that the standard approach to split evaluation (called naïve selection approach in semtree) is biased by favoring the selection of covariates with many unique values over covariates with few unique values (Brandmaier et al, 2013b). The semtree package offers a correction procedure (fair selection approach) for this selection bias (also known as attribute selection error; Jensen and Cohen, 2000). However, this correction procedure is heuristic and comes at the price of decreased statistical power to detect group differences.…”

Section: Introductionmentioning

confidence: 99%

Score-Guided Structural Equation Model Trees

2021

View full text Add to dashboard Cite

Structural equation model (SEM) trees are data-driven tools for finding variables that predict group differences in SEM parameters. SEM trees build upon the decision tree paradigm by growing tree structures that divide a data set recursively into homogeneous subsets. In past research, SEM trees have been estimated predominantly with the R package semtree. The original algorithm in the semtree package selects split variables among covariates by calculating a likelihood ratio for each possible split of each covariate. Obtaining these likelihood ratios is computationally demanding. As a remedy, we propose to guide the construction of SEM trees by a family of score-based tests that have recently been popularized in psychometrics (Merkle and Zeileis, 2013; Merkle et al., 2014). These score-based tests monitor fluctuations in case-wise derivatives of the likelihood function to detect parameter differences between groups. Compared to the likelihood-ratio approach, score-based tests are computationally efficient because they do not require refitting the model for every possible split. In this paper, we introduce score-guided SEM trees, implement them in semtree, and evaluate their performance by means of a Monte Carlo simulation.

show abstract

Untitled

Cited by 165 publications

References 40 publications

Assessment of Business Interruption of Flood-Affected Companies Using Random Forests

Assessment of Business Interruption of Flood-Affected Companies Using Random Forests

Training subset selection in Hourly Ontario Energy Price forecasting using time series clustering-based stratification

Score-Guided Structural Equation Model Trees

Contact Info

Product

Resources

About