Data Management

Bayesian Classifiers Programmed in SQL

Date Added: Jan 2012
Format: PDF

The Bayesian classier is a fundamental classification technique. In this paper, the authors focus on programming Bayesian classifiers in SQL. They introduce two classifiers: Naive Bayes and a classier based on class decomposition using K-means clustering. They consider two complementary tasks: model computation and scoring a data set. They study several layouts for tables and several indexing alternatives. They analyze how to transform equations into efficient SQL queries and they introduce several query optimizations. They conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations and scalability. Their Bayesian classier is more accurate than Naive Bayes and decision trees.