[v17] S. Daskalaki, I. Kopanas, N.M. Avouris, Predictive Classification with Imbalanced Enterprise Data, in Liao, T.W. and E. Triantaphyllou, (Eds.), Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, World Scientific, Singapore, pp. 147-187, 2008 (pdf)
Enterprise data present several difficulties when used in data mining projects. Apart from being heterogeneous, noisy and disparate, sometimes they are also characterized by major imbalances between events of different classes. Predictive classification using imbalanced enterprise data necessitates methodologies, which are adequate for such data, especially for training algorithms and for evaluating the resulting classifiers. It is therefore important to experiment with several class distributions in the training sets and a variety of performance measures, which are known to expose better the strengths and weaknesses of classification algorithms. In addition, combining classifiers into schemes, which are suitable for the specific business domain, may very well improve predictions. However, the final evaluation of the classifiers must always be based on the impact of the classification results to the enterprise which can take the form of a cost model that reflects requirements of the enterprise and existing knowledge. In this chapter, taking as example a telecommunications company, we provide the methodological framework for handling enterprise data during the initial phases of the project, as well as for generating and evaluating predictive classifiers. Moreover, we provide the design of a decision support system, which embodies the previously described process with the daily routine of a telecommunications company that struggles to prevent customer insolvency without risking customer relations.