International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
Spam is any unwanted message, especially advertisement and fraud schemes. The average cost of spam per employee per year at 82 of the fortune 500 companies is estimated to be $1934. The paper proposes a novel process called G-LDA, which takes concepts from Latent Dirichlet Allocation (LDA) and genetic evolution techniques. This involves framing a set of words having a high frequency of occurrence in any spam email. The method was tested on enron spam corpus. The phrases that were evolved through the generations reflected significant improvement.