Malware Detection Using Statistical Analysis of Byte-Level File Content
Commercial anti-virus software are unable to provide protection against newly launched (a.k.a "Zero-Day") malware. In this paper, the authors propose a novel malware detection technique which is based on the analysis of byte-level file content. The novelty of the approach, compared with existing content based mining schemes, is that it does not memorize specific byte-sequences or strings appearing in the actual file content. The technique is non-signature based and therefore has the potential to detect previously unknown and zero-day malware. The authors compute a wide range of statistical and information-theoretic features in a block-wise manner to quantify the byte-level file content. They leverage standard data mining algorithms to classify the file content of every block as normal or potentially malicious.