Read about an algebraic equalizer that solves a big data analytics challenge.
The Internet of Things is going to open more new sources of data than ever before, and the data fire hose that is spewing billions of data bits into enterprises is working around the clock to keep up with the flow. At the same time, organizations want to append and to recombine various subsets of this data into breakthrough databases that are capable of producing innovative business insights.
While this process of data recombination is a recent offshoot of harnessing big data for analytics, one fact about it is not new: the throttling element for reaching the data recombination goal continues to be data and system integration — both historical barriers for IT. The data integration challenge is what makes fresh approaches like that taken by Algebraix Data so intriguing.
"For 40 years, everyone has been building relational databases to bring data together, and we have been dominated by these relational ecosystems," said Charlie Silver, CEO of Algebraix Data. "What people don't always realize is that in the course of technology innovation throughout the decades, there are now more new computer languages than human languages. Along with this, there are hundreds and hundreds of different data models."
The problem with independent development of different computer languages and data models is that no thought is given to integration, so these assets don't talk to each other. For corporate data aggregators, these disparate data models and data sources become major obstacles when it is necessary to find ways for all of them to work together in an analytics scenario.
"When we founded our company, it was founded with a mission to find a common and universal denominator that would make all of these diverse data sources talk to one another," said Silver. "Mathematics has always been the universal language, whether you are finding a mathematical expression that can represent a graph, Internet of Things data, a database, or text-based data. All of these data can be described algebraically through variables, unions, intersections, distributions, and so on."
Silver gave the example of describing a person who was a journalist. "Describing a particular individual as a journalist can be done in an algebraic statement," he said.
The next step is to convince the tech industry of mathematics' universal potential, and its ability to serve as a kind of data "equalizer" that can plug any type of data into a single community of data with other data. For this to happen, there are just as many proprietary and political hurdles to overcome as there are technical challenges.
"We liken ourselves to more of a biotech than a technology startup," acknowledged Silver. "Our focus is on very deep research and development. We are like a biotech that first discovers a unique compound and then must figure out the precise applications in which the compound will work."
How the company is advancing its work
Algebraix recently secured $40 million in funding. It has also has an active analytics relationship with Khan Academy, an online, free K-12 education resource that focuses on math and science. In addition, Algebraix has changed its view on its research, which it had regarded as a closely held trade secret. Instead, it now has given some of its work to the open source community in the hope that new ways can be found to commercialize the technology.
"In the future, we are also planning to develop tools that organizations can use with the technology," said Silver, "But initially, we are providing the technology as service where we perform the analytics by using our approach, and we then deliver the end results of the analytics to customers."
How well is the technology working?
"It is very effective in analytics queries," said Silver. "All of the data gets organized algebraically, so the data engine knows exactly where to go to find the answers to a specific query. In the process, 99 percent of the data in a data repository doesn't have to be processed because of these selective criteria. This can optimize the performance of a Hadoop cluster that might take hours or days to run by getting to the same result in a matter of seconds."