Classification problems with uneven class distributions present several difficulties during the training as well as during the evaluation process of classifiers. A classification problem with such characteristics has resulted from a data-mining project where the objective was to predict customer insolvency. Using the dataset from the customer insolvency problem we study several alternative methodologies which have been reported to better suit the specific characteristics of this type of problems. Three different but equally important directions are examined; (a) the performance measures that should be used for problems in this domain, (b) the class distributions that should be used for the training data sets, (c) the classification algorithms to be used. The final evaluation of the resulting classifiers is based on a study of the economic impact of classification results. This study concludes to a framework that provides the "best" classifiers, identifies the performance measures that should be used as the decision criterion and suggests the "best" class distribution based on the value of the relative gain from correct classification in the positive class.This framework has been applied in the customer insolvency problem, but it is claimed that it can be applied to many similar problems with uneven class distributions that almost always require a multi-objective evaluation proces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.