Today large enterprises are facing the challenges of Bring Your Own Device (BYOD), untameable email inboxes and an ever-growing threat landscape such as spam and malware with email repositories that are on the scale of most enterprise Big Data projects.

Large-scale email management can be a Big Data exercise if you consider the following:

  • Email retention policies (doubly important in organizations that need to meet compliance programs).
  • Tracking and archiving customer email such as email sent to an open email address such as a jobs or sales account.
  • Secure email archives so only a select group of administrators can access, view, and query the archives.

Magnify these considerations by tens of thousands of email inboxes, it becomes time for Big Data Analytics and related tools to step up and do their part in the email fight.

Big Data and the email inbox

I had a chance to speak with David Will, CEO, Nick Martin, software engineer, and Claire Willet, market development manager of Riparian Data, the developers behind Gander, an enterprise email management platform that uses Big Data Analytics.  It sits in the cloud between corporate Exchange email servers and the Internet. Currently, Gander is in private beta, with an upcoming public beta release at the year’s SXSW. It uses “big data to make small inboxes.” Riparian Data’s development efforts are an intriguing mix of Big Data analytics and email productivity wrapped up in the cloud.

For corporate email users, Gander sorts and manages email in the cloud and supports both mobile devices and PCs.

Gander is their move to commercialize Timberwolf, an open source email analytics solution that pulls data from Microsoft Exchange into HBase. Once the email data is in HBase then it is possible to run common query languages against it for data analytics. Pulling email data into HBase opens up the following possibilities:

  • Social analytics
  • Social network graphing
  • Mapping of correlations between the unstructured data in email and other HBase data

There are also possible data intensive applications such as eDiscovery and log management/migration. The nature of Timberwolf and HBase being open source, makes the possibility of email analytics through Big Data more accessible than ever before. The one downside is that HBase has no onboard security so going the Timberwolf route for email analytics is going to mean extra security preparations for most enterprises.

Gander aims its Big Data analytics at your email history and then compares it to email coming into the inbox section, places email from robots into the skim folder, and also places email to respond to promptly into a Respond section. Gander can run these analytics across hundreds to thousands of email boxes in an enterprise. Running behind the scenes of Gander is a Mongo database that presorts  large volumes of email and it’s a lot more economical to access the email data from Mongo than it is from an Exchange Server database using one of the email analytics tools out on the market. This is another case where corporate data can be accessible for further analytics and reporting to a wider audience than lets just say Exchange Administrators.

The Riparian Data team stressed to me the advantage of Big Data as an email management platform including storage advantages,  keywords, and categorization happening behind the scenes. At first thought, none of this really phased me since I’ve  written about email technologies in the past. It wasn’t until I put Big Data, large-scale email management across an enterprise, and tools such as Gander together that don’t require user intervention before I began to truly appreciate the potential roles of Big Data in enterprise email management.

The early stages of Gander point to a Big Data future for enterprise level email management tools including:

  • Large-scale email management on BYOD devices in terms of thousands to tens of thousands of email inboxes.
  • APIs for better integration with third party tools.
  • Compliance program support for Sarbanes Oxley and others that contain regulation of email and email archives.
  • Another weapon in the arsenal against spam.

Big Data and eDiscovery

Email still has a dominant communication role in businesses today, consequently, it also plays a role in corporate bad behavior.

A recent Data Center Journal article entitled eDiscovery and Big Data: Time to Call In the Expert? By Aron Israely, points to the complexities of analyzing enterprise email as it sits across multiple platforms and personal email folders may contain multiple data types. eDiscovery uses Big Data Analytics to analyze email storage during the course of white-collar criminal cases to help turn up evidence of wrongdoing. Such tools work across .PST files and storage archives alike.

Big Data Analytics vs. email

Big Data is for the large-scale email battle whether it be better enterprise email management, large BYOD initiative support, or eDiscovery. Despite the maturation of enterprise social networks and collaboration platforms, corporate email is still a popular communications  form and Big Data has the potential to help enterprise tackle email challenges on all fronts rather than just one inbox at a time.