Scalable Detection of Similar Code: Techniques and Applications
Similar code, also known as cloned code, commonly exists in large software. Studies show that code duplication can incur higher software maintenance cost and more software defects. Thus, detecting similar code and tracking its migration have many important applications, including program understanding, refactoring, optimization, and bug detection. This dissertation presents novel, general techniques for detecting and analyzing both syntactic and semantic code clones. The techniques can scalably and accurately detect clones based on various similarity definitions, including trees, graphs, and functional behavior. They also have the general capability to help reduce software defects and advance code reuse.