Building Statistical Models and Scoring With UDFs
Multidimensional statistical models are generally computed outside a relational DBMS, exporting data sets. This paper explains how fundamental multidimensional statistical models are computed inside the DBMS in a single table scan exploiting SQL and User-Defined Functions (UDFs). The techniques described herein are used in a commercial data mining tool, called Teradata Warehouse Miner. Specifically, the authors explain how correlation, linear regression, PCA and clustering, are integrated into the Teradata DBMS. Two major database processing tasks are discussed: building a model and scoring a data set based on a model.