Data Management

High-Speed Data Stream Mining Using VFDT

Download Now Date Added: Jan 2012
Format: PDF

Large databases that grow without limit at a rate of several million records per day and to mining these continuous data streams brings unique opportunities to the researchers. Here, the authors describe and evaluate VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can incorporate tens of thousands of examples per second. It uses Hoeffding bounds to guarantee that its output is asymptotically nearly identical to that of a conventional learner. They demonstrate its utility through a set of experiments on synthetic data.