Big Data

Predictive Data Mining for Highly Imbalanced Classification

Date Added: Dec 2012
Format: PDF

The paper addresses some theoretical and practical aspects of data mining, focusing on predictive data mining, where two central types of prediction problems are discussed: classification and regression. Further accent is made on predictive data mining, where the time-stamped data greatly increase the dimensions and complexity of problem solving. The main goal is through processing of data (records from the past) to describe the underlying dynamics of the complex systems and predict its future. Traditional classification algorithms can be limited in their performance on highly imbalanced datasets. A popular stream of work for countering the problem of class imbalance has been application of a sundry of sampling strategies. In this paper, the authors focus on the problem of class imbalance. They incorporate different "Rebalance" heuristics.