University of California
When software developers modify one or more files in a large code base, they must also identify and update other related files. Many file dependencies can be detected by mining the development history of the code base: in essence, groups of related files are revealed by the logs of previous workflows. From data of this form, the authors show how to detect dependent files by solving a problem in binary matrix completion. They explore different Latent Variable Models (LVMs) for this problem, including Bernoulli mixture models, exponential family PCA, restricted Boltzmann machines and fully Bayesian approaches.