Online sandbox services are an interesting concept. Individuals who suspect that a file or URL may be malicious can upload the file or URL to the portal of a malware analysis service, and in short order receive an answer. Anubis, Malwr (Figure A), and VirusTotal are examples of such a service.
A group of researchers from Eurecom, Symantec Research Labs, and Universita' degli Studi di Milano decided to investigate databases from several malware analysis services — some containing millions of samples. Mariano Graziano, one of the authors and lead presenter of the team's paper Needles in a Haystack: Mining Information from Public Dynamic Analysis Sandboxes for Malware Intelligence (PDF) at USENIX Security 2015 (video), writes in an email exchange:
"We inspected the samples submitted to the Anubis sandbox. These binaries are voluntarily submitted to the sandbox by users who want more information about the behavior of Windows PE executables. This data set contains over 30 million samples collected over a period of six years."
The team used sophisticated analytics to make sense of the data. The first step was to reduce the size of the data set from 32 M to 12 K. Next the team:
- clustered data sets via binary similarity and submissions metadata;
- used binary analysis techniques to inspect samples in the clusters;
- extracted interesting features from the samples;
- and trained a classifier to automatically discover malware.
The researchers found something of note: Malware used in several high-profile attack campaigns was found in the databases being studied. Not that unusual until the researchers correlated when (Time Before Public Disclosure) the malware was submitted. Some of their findings are shown in Figure B.
As to how the samples ended up in the database, the USENIX paper offers some possible explanations:
- The files were automatically collected as part of an automated network or host-based protection system.
- A security analyst may have noticed something anomalous on a computer and wanted to double-check if a suspicious file exhibited a potentially malicious behavior.
- Malware developers could have submitted an early copy of their work to verify whether it triggered any alert on the sandbox system.
The report goes on to say, "Whatever the reason, the important point is that no one paid attention to those files until it was too late."
Tracing malware development
The research highlights the ongoing and important challenges associated with malware that is caught but mislabeled, and therefore not properly associated with advanced persistent threat (APT) campaigns. To that end, the researchers focused on the detection of what they call malware development — seeing if it's possible to identify the activity of malware developers and get the word out proactively.
"We use the term 'development' in a broad sense, to include anything that is submitted by the author of the file itself," mentions the report...."Our main goal is to automatically detect suspicious submissions that are likely related to malware development or to a misuse of the public sandbox. We also want to use the collected information for malware intelligence."
To accomplish their goal, the researchers figured out how to distinguish malware development samples from ordinary malware samples. Although not perfect, the team's prototype implementation was able to mine the data sets and collect substantial evidence related to malware developments.
"Our system automatically detected the development of a diversified group of real-world malware, ranging from generic trojans to advanced rootkits," adds the USENIX report. "To better understand the distribution of the different malware families, we verified the AV labels assigned to each reported cluster."
Listed below are the types of malware the team's automated tool found within the 1,474 clusters tested:
- 45 botnets
- 1,082 trojans
- 83 backdoors
- 4 keyloggers
- 65 worms
- 21 malware development tools
When asked what this all means, Graziano surmises, "The system can be deployed as an early-warning system to flag suspicious submissions. This system can be attached transparently to any sandbox and we expect similar results from other data sets."
It was suggested that the bad guys would then just stop using the online malware analysis services. "Mistakenly, people think the proposed system would stop these suspicious submissions," writes Graziano. "But, the truth is the bad guys have to interact with sandboxes and with security products in general to learn how they work in order to devise and test evasion techniques."
Graziano adds, "We believe the key message of the paper is that malware authors are abusing public sandboxes to test their code, and we do not need very sophisticated analysis tools to find them."
Note: Since the paper and USENIX Symposium, Mariano Graziano has become a security researcher in Cisco's Talos Security Intelligence and Research Group.
Information is my field...Writing is my passion...Coupling the two is my mission.