Similarity Detection in Source Code Using Data Mining Techniques
With the advent of the Internet, the great big world has but shrunk into a single global village. Resources, millions of them, are just a click away from any user, wherever one may be present physically. With this great luxury also come the shades of grey too. Plagiarism is one such, which is being rampant in the present days to a very high degree. In this paper, the authors present a study of three techniques, Jaccard Similarity (JS), Cosine Similarity (CS) and Jaccard Similarity with Shingles, with respect to source code plagiarism and compare the various results obtained.