Collective Intelligence: Can it save anti-virus apps?

No, it's not the "Borg" from Star Trek. But, Collective Intelligence uses the same concept and could revolutionize the way anti-virus applications work.

I have been following a University of Michigan project called CloudAV Architecture: N-Version Anti-Virus (pdf). Like most of us, researchers at the University of Michigan realize traditional anti-virus applications are not deterrents. Let's take a look at why.

Computer-based anti-virus

Typical anti-virus programs reside on the local computer and consist of two parts, an intercept driver and a detection engine. The intercept driver tests objects using signature files, heuristics, and behavioral analysis. If something questionable is found, the driver sends pertinent information to the detection engine, which then checks for matches in the signature database.

The signature database is the weak link. Whether the detection engine finds a match or not depends on how up-to-date the database is. Which is dependent on how fast the threat researchers produce a signature file and when the application updates.

On-line anti-virus scanners

On-line anti-virus scanners are being touted as an improvement over resident anti-virus applications. But, they have several problems as well:

  • No real-time protection, only on-demand scanning.
  • No protection if the computer is disconnected from the Internet.
  • A semi-static signature database is still used, with accuracy depending on the last time it was updated.

Understanding the problems with traditional approaches as well as on-line scanners, the University of Michigan research team determined a new approach was needed. Why not make anti-virus, an intelligent Software as a Service (SaaS) and gain the following benefits:

  • Improved detection of malware: This model increases the likelihood of malware being found, because multiple detection engines working in parallel can be used.
  • Local anti-virus vulnerabilities are not a problem: Moving the anti-virus engine to the cloud eliminates the ability of malware to manipulate the client anti-virus application.
  • Real-time signature definitions: Data from client computers are continually uploaded to the detection engine's database, providing real-time answers to queries from other host computers that may be encountering the same malware.
  • Small footprint on host: Moving malware detection off the client and into the cloud simplifies client software, extending anti-virus protection to devices with limited processing power (smart phones).

Besides being different from traditional anti-virus applications, CloudAV is not a cloud-based anti-virus scanner. Unlike scanners, CloudAV creates an active and continuing relationship between client computers and servers that house the CloudAV detection engines.

The theory sounds good, but I can't test it. It seems CloudAV is only in use on the University of Michigan campus.

Panda Security

Last week, Panda Security introduced Panda Cloud Anti-virus for consumers and Panda Cloud Protection for small-to-medium businesses. Juan Santana, CEO of Panda Security mentions:

"The launch of Panda Cloud Protection and Panda Cloud Antivirus represents an evolutionary step in our ability to combat cybercrime, and one we're confident the industry will follow. Panda's new and improved security services leverage our extensive R&D in cloud computing to keep our business and home users protected with as little effort and investment as possible."

On the surface, both programs appear similar to CloudAV. They use cloud-based anti-virus detection engines and thin clients on the host computers. For this article, I would like to focus on Cloud Anti-virus.

Thin client

After installation, the Cloud Anti-virus thin client immediately runs a complete scan of the computer, making an inventory of existing processes. If questionable objects are found, the thin client defers to the Panda Security database for removal instructions.

Once the catalogue is established, the thin client uses the following three types of scans to maintain an accurate inventory and check out new objects:

  • On-access scan: The maximum-priority scan applied to objects right before they are executed. The files are intercepted, prevented from running, and disinfected if found to be malicious.
  • Pre-fetch scan: A joint local and cloud scan of a file that is currently idle, but is expected to be executed shortly. This type of scan only takes place when performance is not impacted.
  • Background Scan: The lowest priority scan that only runs when the computer is idle, so as not to impair performance.

The following slide shows the results of a scan:

Collective Intelligence

Collective Intelligence is Panda Security's term for the servers that provide the anti-virus detection engines. As information is uploaded from the thin clients, it is analyzed and categorized by the Collective Intelligence technology.

If a new malware strain or a variant of an existing strain is discovered, the servers will create and send detection/removal instructions to each client node. To get an idea as to what is happening with Collective Intelligence, Panda Security has created a real-time monitor on their Web site.

What information is being uploaded

I asked Sean-Paul Correll, a threat researcher for Panda Security, what exactly is uploaded to the Collective Intelligence. Mr. Correll explained that the thin client builds what they call a "reverse signature". A small file comprised of data needed to recognize malware signatures, specifically:

  • Cloud heuristics
  • How the executable file interacts with the operating system
  • Alterations to the system's inventory fingerprint

Before the data is sent to the Collective Intelligence servers, it is hashed to ensure privacy and authenticity of the message.

Off-line operation

I was concerned if computers would be protected adequately when off-line. Mr. Correll explained that:

"Computers are still protected while not connected to the Internet. Cloud Anti-virus keeps a local copy of the Collective Intelligence cache for off-line operation."

I then asked if it wasn't redundant to have the thin client check the Collective Intelligence, when there was a local copy of the cache. Mr. Correll clarified it for me:

"The number of new malware signatures amounts to approximately 150,000 a day. That many can be processed by the Collective Intelligence servers, allowing real-time queries by the thin clients. But, it would be near-impossible to keep the local cache of every thin client that up to date."

Initial testing

Awhile back, I posted about a ComputerWorld article that was trying to determine if using free anti-virus software was worthwhile or not. At that time, Panda Security's Cloud Anti-virus was also tested, but not written about., the company selected by ComputerWorld to run the tests disclosed why:

"The program's (Cloud Anti-virus) design also meant that it could not work with our current method of proactive-protection testing, which requires us to use two- and four-week-old signature databases to simulate how well an antivirus tool performs."

In a later ComputerWorld article, offered these results:

"If its excellent showing at detecting malware in's zoo of half a million samples is any indication, the approach works. Panda's app produced an impressive 99.4 percent overall detection rate."

The next best performance, 98.9, was achieved by Avira AntiVir Personal.

Final thoughts

It appears using Collective Intelligence will give anti-virus applications better tools for fighting malware. Let's hope so.

I would like to thank Ms. Amy Ziari of Bateman Group and Mr. Sean-Paul Correll of Panda Security for their patience and help with this article.