Data Management

Efficient Computation of PCA With SVD in SQL

Download Now Free registration required

Executive Summary

PCA is one of the most common dimensionality reduction techniques with broad applications in data mining, statistics and signal processing. In this paper, the authors study how to leverage a DBMS computing capabilities to solve PCA. They propose a solution that combines a summarization of the data set with the correlation or covariance matrix and then solve PCA with Singular Value Decomposition (SVD). Deriving the summary matrices allow analyzing large data sets since they can be computed in a single pass. Solving SVD without external libraries proves to be a challenge to compute in SQL.

  • Format: PDF
  • Size: 195.94 KB