Business Intelligence

Scalable Regression Tree Learning on Hadoop Using OpenPlanet

Free registration required

Executive Summary

As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, the authors describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework using a hybrid approach.

  • Format: PDF
  • Size: 1464.32 KB