Association for Computing Machinery
Spam, unsolicited emails, adversely affects the regular email communications on the Internet. Spammers use botnets, a malicious program hidden in a group of computers which are remotely controlled by spammers, to distribute spam and conceal their identities. In this paper, the authors investigate image spam with data mining techniques in order to reveal the common sources of unsolicited emails. To identify the origins, a two-stage clustering method groups visually similar spam images by exploring their visual features, including color feature, layout feature, text layout, and background textures. They test the proposed approach under different settings and combinations of features and measure the performance with a modified F-measure.