Scalable Regression Tree Learning on Hadoop Using OpenPlanet
As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, the authors describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework using a hybrid approach.