Beyond the Stone Age
"We have beaten up hardware relentlessly to improve performance and also energy efficiency in data centers," said Mike Hoskins, Chief Technology Officer At Actian, a provider of big data analytics solutions, "There has been so much invested, yet the results have been poor."
Hoskins believes that the path toward more effective server utilization in data centers rests in software, and he uses the example of a two processor, 16 core server to illustrate his point.
"One can argue that software is in a kind of "stone age," in that it just hasn't kept pace with hardware innovations," said Hoskins. "In the case of a two processer, 16-core server, single-threaded software, which most software today is, only keeps one to two cores busy and the other 14 cores go unused. We experienced a bit of a breakthrough when virtualization technology came on the scene, because in an environment like VMware, you could take eight cores of processing and spread them over four virtual machines that used two cores each." This compensated for the limitation of single-threaded software because the software could be spread across four different server engines with the core divisions afforded by the virtualization.
Unfortunately, when we are talking about big data and analytics, "sleight of hand" virtualization techniques that can improve software performance simply don't work. The reason is that big data with its massively parallel processing, isn't well suited for virtualization. Consequently, sites are potentially left with the challenge of crunching through massive amounts of data in a small amount of time - and possibly bumping up against limits in the software they are using. At the same time, their budgets constrain them from investing in even more powerful processing that goes beyond what they have available to work with - various incarnations of x86-based server technology.
Hoskins believes that sites can overcome the single-threaded core utilization limits of most software if they can somehow lash processing cores together in a harmonious memory-only data flow engine that can parallel-process incoming data and also address the various steps of big data processing that have to be done, such as data cleaning, aggregating, ingesting, and finally analytics.
Tiers of data
The idea is to move tiers of data in memory closer to the CPU to improve overall performance. Here is how it works.
"In this environment, a processing engine can know where data is executing and then push the data that will be required into L3 cache," said Hoskins. "This is one way that we can get around software constraints and get the most out of hardware."
The technology is promising and exciting to sites that want to stick with commodity x86 servers (PDF) for their big data processing. As use cases begin to appear, it is also likely to add fuel to current architectural debates as to whether x86, Unix-based or other hybrid platforms are best suited for enterprise HPC (high performance computing).
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.