The beauty of big data is in its breadth, its variety, and the endless possibilities it presents for cultivating new information to answer age-old questions that have eluded a business for decades.

But there is also the ugly side to significant in-streaming of Internet of Things and other sources of unstructured information originating on the web: What if these big data payloads carry malware or other viral infections that can compromise an enterprise network?

Once big data is inside the enterprise walls, processing engines such as Hadoop (which were not designed with security in mind) are ill-equipped to deal with compromised data. The low security thresholds of Hadoop are further tested by the wide open exposure to security of millions of mobile devices that pass data back and forth across the internet every day, including malware and denial of service attacks.

Concerned about malware and the issue of large amounts of security-compromising big data entering from the internet, OpenGraphiti senior security research lead of OpenDNS Andrew Hay talks about an internal tool for malware detection and data visualization that his company employs and now offers a free tool for data scientists and others who need to check the safety of incoming data.

“This data visualization engine creates visibility so data scientists can take loosely related data and assess it for patterns in order to detect malware or compromised data,” says Hay.

In the case of big data, this analytics tool works with enterprise firewalls, intrusion detection devices, and any other data sources attached to the network through which incoming big data streams flow. It detects the various business rule sets for security in firewall and intrusion device checkpoints to determine the full range of security provisions that the enterprise has set, and then analyzes the sources of incoming data to determine which sources trigger reactions from the enterprise’s security rules most often. The most active “trigger point” (which could be a device, a website, etc.) becomes the “hot items” of follow-up forensics and analytics to determine what it is about these data sources that trigger so many enterprise security alerts. In some cases, the threat of a malware or a virus invasion may be minimal after examination; in other cases, the data integrity and safety of a data source that is being used by big data analytics might be questionable.

“Many enterprises are still relying on log management reviews to find these pockets of security-threatening data, so we still need to improve best practices in procuring quality big data from a security standpoint,” says Hay. “But some enterprises are getting to where they recognize that they need other data screening tools and approaches for their big data. In financial services, for example, we find that there are institutions that are taking in their data and then rebuilding their own data warehouses and analytics.”

The objective is to take care of the security checkouts and the sanitization of data in-house. In contrast, an OpenDNS data visualization tool that shows the shape of data before it even comes in the door with malware and other dire security threats can preclude much of the effort that has to occur in data centers, where the job of security sanitization can be significantly more labor-intensive and hands on.

“We are developing a user community whose input is very useful for building out new capabilities in the tool, and we already see the use cases that are possible,” says Hay.

Now, it’s just a question of more enterprises getting onboard with network strategies that complement the big data work that is already going on in database, server, and end user areas.