The Distribution of Program Sizes and Its Implications: An Eclipse Case Study
Source: Tsinghua University
A large software system is often composed of many inter-related programs of different sizes. Using the public Eclipse dataset, the authors replicate the previous study on the distribution of program sizes. The results confirm that the program sizes follow the lognormal distribution. They also investigate the implications of the program size distribution on size estimation and quality predication. They find that the nature of size distribution can be used to estimate the size of a large Java system. They also find that a small percentage of largest programs account for a large percentage of defects, and the number of defects across programs follows the Weibull distribution when the programs are ranked by their sizes.