When Jason Atlas joined the online threat-protection firm IID two years ago, he and founders Rod Rasmussen and Lars Harvey could see that their legacy database "was already falling over." Atlas, Vice President of Engineering and Technology at IID, noted that the database "was 180 GB in size, and we were clustering MySQLs and response times for certain queries. It was becoming untenable. Relational databases just could not scale to what we needed to do."
IID aggregates and analyzes widely sourced threat data to enable the protection of client assets and users. The firm's customers include major financial firms, government agencies, leading e-commerce companies, and social networks and ISPs that use IID's products to detect and mitigate threats. IID does this by sharing information, compiling and comparing data, analyzing it, and then performing actions on it.
With the legacy database, said Atlas, "we weren't able to do a lot of the other things we wanted to do down the road. So we had to start looking at alternative technologies to solve these use cases. How do you store enormous quantities of variant types of data? How do you then build a system that allows all this different type of information to be readily retrievable, and analyze it, and also being readily queryable, sortable, and viewable, as well as obviously being highly secured?"
IID also found that "we're sending data and information out to organizations, and none of them are really talking to each other. There are threats out there, but there wasn't an enormous amount of coordination and data sharing in the systems."
"Rod and Lars," said Atlas "came to the realization that they needed to build a big data system that allowed all of our customers to share information and analytics with each other."
DataStax Enterprise: Open source technology with full vendor support
"The first thing I did in order to test our technologies," explained Atlas, "was to take the biggest problem that we have—our Internet protocol (IP) database, and add a zero to the amount of data, because if we're going to solve this problem then let's solve it at a real scale, for the future."
"Using real-world examples of how people are querying our IP data and accessing it, we removed it from MySQL and tested it on three different NoSQL platforms." Atlas stated, "Cassandra from DataStax blew them away. It met all of our requirements, it was easy to work with, and we were able to get it up and running in an Amazon Web Services (AWS) instance."
"We are in the process of designing a massive-scale database, up to a petabyte of information, that we are building our own dedicated data center for," said Atlas. He added that, "Cassandra provided by DataStax is definitively our key value store of choice."
"I have not found anything that comes close to the linear scale capabilities and the rewrite capabilities that Cassandra has," said Atlas." And I've tested a lot of databases. Just as a benchmark: It is three times faster than Microsoft Cosmos. I built a data intelligence platform on Cosmos three years ago, and I can tell you that Cassandra is actually, literally three times faster."
"We realized we had to know how to index our database within the NoSQL store," said Atlas. "Solr is also an Apache component, which we're currently looking at. And we also need a Hadoop MapReduce computational environment, a compute environment, again for analytics, for crunching, for associations, computation, for queries of that nature."
"I was terrified of going to market with all of these new technologies that didn't have a vendor behind them to give me 24/7 support," said Atlas. "The fact of the matter was I needed all these technologies, and DataStax bundled them all for me."
"In addition, they gave me this awesome ops console that allows me to monitor and measure things that basically is the equivalent of a business activity monitor, including workflow elements and the Cassandra store. And I have the support," added Atlas, "but I am not vendor locked, because I am sitting on open source systems."
"I now have all the benefits of open source, with the support and benefits of an actual vendor," said Atlas. "In my mind I get the best of both worlds, and I have the technology that really maps to my use cases. I am actually quite pleased, to be perfectly honest."
Results with DataStax Enterprise
Quick project completion and time to revenue
"We migrated our entire product platform onto AWS in three months." said Atlas. "Our IP service is now a cloud service running in a secured environment and is running on Cassandra as a full-blown product. It is a 'pay for' product as we speak."
"The new platform has been designed around DataStax as our core engine," added Atlas. "It's what I call our 'black box' system. The data goes in and magic happens after that. And then the APIs go and retrieve data and we present it to our customers."
"We just shipped a demo. It's a working functional prototype," said Atlas, "and it is being presented at RSA as we speak."
Three core elements with cost savings
Atlas remarked "the fact that they integrated three core elements with an ops console on top of it that allows us to monitor and measure was enormous."
He explained, "I have three technology pieces in this stack: 1) computation, with MapReduce through Hadoop; 2) search and indexing, with Apache Solr; 3) and store and security elements with Cassandra from DataStax."
"And I have them all from the same vendor, integrated with dev and 24/7 support, for one-fifth the price that I would pay for a relational database."
Top-level security at the atomic level
"One of the things that was key to us, that Cassandra affords, is what I call atomic level auditing and governance in our system," said Atlas. "We're dealing with security companies, a lot of these places are audited. They have to make sure that the information they distribute is only touched by the right people, and only seen by the right people."
Using DataStax Enterprise "we were able to audit at a low level within Cassandra, essentially every single transaction, every time somebody accesses somebody else's data or they access their own data."
"This is a very high level of making sure there is no repudiation, making sure there's no penetration of the systems," said Atlas. "So we really have built an incredibly secure ecosystem and we would not have been able to use systems besides DataStax Enterprise, because we couldn't secure them and they would not allow that level of atomicity."
Developer resources included
"NoSQL is a relatively new programming methodology. People still think in relational terms," explained Atlas. "They don't necessarily know how to work with a NoSQL system correctly."
DataStax provides "the developer resources, the developer training programs, and on boarding programs to get your developers up to speed. So that benefit was huge."
The bottom line: Value and security
"Customers and enterprises are already very reluctant and scared to share data," said Atlas. "You need to give serious value, and you need to make darn sure you're protecting and treating their data, and caring about it as much as they do."
"And that's not just a marketing pitch," added Atlas, "that's how we designed the system, and Cassandra and DataStax are a core part of that."
Brian will do client work for AtTask.
Brian Taylor is a contributing writer for TechRepublic. He covers the tech trends, solutions, risks, and research that IT leaders need to know about, from startups to the enterprise. Technology is creating a new world, and he loves to report on it.