[j43] Daskalaki S., Kopanas I., Avouris N., Evaluation of classifiers for an uneven class distribution problem, Applied Artificial Intelligence, 20, pp. 381-417, 2006. (pdf)
Classification problems with uneven class distributions present several difficulties during theKeywords: data mining, classification, imbalanced class distributions, voting algorithms, Cost-sensitive learning
training as well as during the evaluation process of classifiers. A classification problem with such
characteristics has resulted from a data-mining project where the objective was to predict
customer insolvency. Using the dataset from the customer insolvency problem we study several
alternative methodologies which have been reported to better suit the specific characteristics of
this type of problems. Three different but equally important directions are examined; (a) the
performance measures that should be used for problems in this domain, (b) the class distributions
that should be used for the training data sets, (c) the classification algorithms to be used. The final
evaluation of the resulting classifiers is based on a study of the economic impact of classification
results. This study concludes to a framework that provides the "best" classifiers, identifies the
performance measures that should be used as the decision criterion and suggests the "best" class
distribution based on the value of the relative gain from correct classification in the positive class.
This framework has been applied in the customer insolvency problem, but it is claimed that it can
be applied to many similar problems with uneven class distributions that almost always require a
multi-objective evaluation proces.