Towards Efficient Analysis for Malware in the Wild
The authors propose two novel techniques for reducing the workload for malware analysis. The first technique is restricted instruction, which accelerates finding the Longest Common Subsequence (LCS) between machine code instruction sequences of malware. The second technique is probabilistic disassembly, which can find the most probable disassembly result of a binary stream without a clue, such as debug symbols or the information of import functions. By combining the two proposals and their generic unpacker, they built an automatic malware classification system. Given an unknown malware program, the system enables malware analysts to find the most similar known malware program to this unknown one, and even estimate different/common instructions.