Association for Computing Machinery
The sheer volume of new malware found each day is growing at an exponential pace. This growth has created a need for automatic malware triage techniques that determine what malware is similar, what malware is unique, and why. In this paper, the authors present BitShred, a system for large-scale malware similarity analysis and clustering, and for automatically uncovering semantic inter- and intra-family relationships within clusters. The key idea behind Bit-Shred is using feature hashing to dramatically reduce the high-dimensional feature spaces that are common in malware analysis.