Storing Matrices on Disk: Theory and Practice Revisited
The authors consider the problem of storing arrays on disk to support scalable data analysis involving linear algebra. They propose Linearized Array B-tree, or LAB-tree, which supports flexible array layouts and automatically adapts to varying sparsity across parts of an array and over time. They reexamine the B-tree splitting strategy for handling insertions and the flushing policy for batching updates, and show that common practices may in fact be suboptimal. Through theoretical and empirical studies, they propose alternatives with good theoretical guarantees and/or practical performance.