Fast PCA Computation in a DBMS With Aggregate UDFs and LAPACK
Efficient and scalable execution of numerical methods inside a DBMS is difficult as its architecture is not suited for intense numerical computations. The authors study computing Principal Component Analysis (PCA) on large data sets via Singular Value Decomposition (SVD). Given the difficulty to program and optimize numerical methods on an existing DBMS, they explore an alternative reusability approach: calling the well-known numerical library LAPACK. Thus, they study several alternatives to summarize the data set with aggregate User-Defined Functions (UDFs) and how to efficiently call SVD numerical methods available in LAPACK via Stored Procedures (SPs).