
X86 cluster computing has been a processing workhorse for big data. It has the ability to scale; it is an economical alternative; it is a familiar platform; and it is likely to maintain a major big data processing role, especially for myriad big data Hadoop applications that are run in batch mode and that don’t require rapid delivery of big data processing results.
However, as more organizations gain big data experience and begin to probe areas where big data and analytics can be productively applied, such as compliance and edge computing security monitoring, there is likely to be a call for higher-performance computing that can process and return results in real time.
“We see these new use cases surfacing every day,” said Pat McGarry, vice president of engineering for Ryft, a real-time search and analytics firm. “In one case, an organization had to spend millions of dollars for large x86 computing clusters in multiple locations — but it was unable to send back all of this edge data in a timely fashion to its central location.”
McGarry believes that a new approach, which he calls hybrid computing, can fix this data quandary.
“What we do in the hybrid approach is use an x86 chip on the frontend with an underling FPGA fabric that does the rest of the work,” said McGarry. This field-programmable gate array is an integrated circuit that a manufacturer or customer can custom configure to the needs of a machine. In this case, Ryft programs the FPGA to interact with fast, solid disk storage that is striped across different SSD storage devices, and that can deliver twice the speed of normal SSD.
McGarry said that a single box outfitted with this technology could outperform a 100-node cluster of x86-based machines in benchmarks.
“Speed was not the issue, but the problem was how a big data user could employ this machine without having to understand the underlying hybrid architecture,” he said. “The architecture must seamlessly handle multiple open APIs and multiple programming languages. It’s difficult to get this right, but this was our goal — to create a user-friendly high-performance computing environment where people could write big data analytics algorithms in any programming language and get immediate answers out of their data.”
One key to the process is that many steps of data preparation, such as ETL (extract, transform and load) are eliminated with FPGA. They are eliminated by virtue of the significantly faster processing speeds of the hybrid architecture, which can rapidly process everything that comes through the data pipelines unedited. At some point on the backend analytics reporting, users need to screen out data that they don’t want to see.
McGarry said that the hybrid processing approach is fast because it uses bare metal processing “95% of the time,” thanks to the use of the FPGA fabric.
“We use FPGA almost exclusively, although there are some cases when we default to using the x86 processors,” he said. These default cases most often involve areas where deep mathematics like trigonometric or calculus functions are employed that require an x86 processor with either a GPU (graphic processing unit) or a math co-processor.
To date, the primary user of this hybrid approach to big data has been government, but the technology is beginning to be commercialized. How aggressively will companies consider or adopt it?
“It really depends on the individual business case,” said McGarry. “If it is an enterprise with a massively centralized data location, there won’t be immediate change or a call for better edge processing overnight. But as data pushes out to the edge and there is a need to rapidly bring those data streams into a central point, sites will find that the technology can become an edge data center in its own right.”