Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation

Bischl, Bernd; Mersmann, Olaf; Trautmann, Heike; Weihs, Claus

doi:10.1162/evco_a_00069

Cited by 154 publications

(124 citation statements)

References 57 publications

Supporting

Mentioning

124

Contrasting

Order By: Relevance

“…It has been observed that the holdout estimator tends to be too pessimistic because only a proportion of the data is used to build the model (Bischl et al 2012). Correspondingly, a variation of the holdout method, which partially alleviates this biased behavior, consists of replicating the partition into training and test sets several times in different random ways; the classifier is trained and tested for each partition and the performances averaged to yield an overall estimate, which is generally more reliable.…”

Section: Data Splitting Methodsmentioning

confidence: 99%

“…With a large number of subsets, the estimator will be very accurate, but the variance will be large. Conversely, with a reduced number of subsets, the variance will be small, but the estimator will be largely biased (i.e, too conservative) (Bischl et al 2012). Although K = 5 and K = 10 are common choices that perform reasonably well for data sets of different sizes, it is worth noting that for very small data sets, a bigger value of K (or even the leave-one-out method) may become slightly preferable in order to train on as many examples as possible.…”

Section: Data Splitting Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

2014

View full text Add to dashboard Cite

Over the last years, it has been observed an increasing interest of the finance and business communities in any application tool related to the prediction of credit and bankruptcy risk, probably due to the need of more robust decision-making systems capable of managing and analyzing complex data. As a result, plentiful techniques have been developed with the aim of producing accurate prediction models that are able to tackle these issues. However, the design of experiments to assess and compare these models has attracted little attention so far, even though it plays an important role in validating and supporting the theoretical evidence of performance. The experimental design should be done carefully for the results to hold significance; otherwise, it might be a potential source of misleading and contradictory conclusions about the benefits of using a particular prediction system. In this work, we review more than 140 papers published in refereed journals within the period 2000-2013, putting the emphasis on the bases of the experimental design in credit scoring and bankruptcy prediction applications. We provide some caveats and guidelines for the usage of databases, data splitting methods, performance evaluation metrics and hypothesis testing procedures in order to converge on a systematic, consistent validation standard.

show abstract

Section: Data Splitting Methodsmentioning

confidence: 99%

Section: Data Splitting Methodsmentioning

confidence: 99%

An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

2014

View full text Add to dashboard Cite

show abstract

“…Simulations [7] may also be used to generate new data. A tool for the enrichment of data bases to fill data gaps is the imputation of missing data [31].…”

Section: Data Acquisition and Enrichmentmentioning

confidence: 99%

“…Predictive power is typically assessed by means of socalled resampling methods where the distribution of power characteristics is studied by artificially varying the subpopulation used to learn the model. Characteristics of such distributions can be used for model selection [7].…”

Section: Model Validation and Model Selectionmentioning

confidence: 99%

Data Science: the impact of statistics

Weihs

Ickstadt

2018

Int J Data Sci Anal

View full text Add to dashboard Cite

In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty. We give an overview over different proposed structures of Data Science and address the impact of statistics on such steps as data acquisition and enrichment, data exploration, data analysis and modeling, validation and representation and reporting. Also, we indicate fallacies when neglecting statistical reasoning.

show abstract

“…We may apply bootstrapping to estimate the distribution of these validation statistics; see Bischl et al (2012).…”

Section: Validation Of Metamodelsmentioning

confidence: 99%

Regression and Kriging metamodels with their experimental designs in simulation: A review

Kleijnen

2017

European Journal of Operational Research

263

View full text Add to dashboard Cite

This article reviews the design and analysis of simulation experiments. It focusses on analysis via either low-order polynomial regression or Kriging (also known as Gaussian process) metamodels. The type of metamodel determines the design of the experiment, which determines the input combinations of the simulation experiment. For example, a …rst-order polynomial metamodel requires a "resolution-III" design, whereas Kriging may use Latin hypercube sampling. Polynomials of …rst or second order require resolution III, IV, V, or "central composite" designs. Before applying either regression or Kriging, sequential bifurcation may be applied to screen a great many inputs. Optimization of the simulated system may use either a sequence of low-order polynomials known as response surface methodology (RSM) or Kriging models …tted through sequential designs including e¢ cient global optimization (EGO). The review includes robust optimization, which accounts for uncertain simulation inputs.

show abstract

Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation

Cited by 154 publications

References 57 publications

An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

Data Science: the impact of statistics

Regression and Kriging metamodels with their experimental designs in simulation: A review

Contact Info

Product

Resources

About