Three data warehouse trends to watch

Customer insights with big data come from multiple sources, and analysts need quick, direct access to data, according to BitYota's Dev Patel in a Q&A with TechRepublic.

Image: iStock/Jimmy Anderson

To derive the full value of big data analytics, an enterprise needs to have a refined and granular view of its customers -- how they digitally interact with its applications and offers. To gain insights, an enterprise needs to be able to analyze unstructured and semi-structured data almost as quickly as it accesses the data.

Traditional database solutions are not able to do this, said BitYota CEO Dev Patel in a recent telephone interview with TechRepublic. You simply won't be able to derive the insights from the data -- legacy solutions can only aggregate broad customer patterns after the fact.

In November 2012, Data Warehouse as a Service (DWS) firm BitYota launched its flagship SaaS-based analytics offering at the Amazon Web Services (AWS) re:Invent Conference. BitYota says it offers a "full-scale data warehouse" with the "flexibility and cost-effectiveness" of the AWS infrastructure, permitting an enterprise "to unleash the value of their data to gain insights and make better business decisions."

Patel spoke in detail about the three main trends that he sees in the data warehouse space:

  • analytics over data from multiple sources,
  • direct access to data by analysts, and
  • insights from high-velocity data.

TechRepublic: Let's say I am a potential customer with a legacy database solution. How would you define what is changing in analytics? What do I tell my stakeholders?

Dev Patel: To better understand your business, you need to better understand the interactivity of your customers with your business at a very granular level. Your traditional systems will not allow you to do that. Your traditional systems will only allow you to do that at some aggregated level.

By definition, you're going to lose some of the features in the data that you need to get better insights. And you need to be able to get better insights from data from multiple sources, and your traditional systems will not allow you to take data from multiple sources to get your insights.

Traditional systems don't let you obtain detailed insights on raw data, or very granular data, or get insights from data from multiple sources, where the velocity of the data could be very different from each source. That is something they don't do very well.

TechRepublic: BitYota launched as a startup in 2011. What business needs and tech trends were you trying to address?

Dev Patel: I'd had a lot of experience at Yahoo dealing with large data. And one of the biggest challenges that I had learned, that companies even as big as Yahoo have, is dealing with analytics from data from multiple sources. As soon as the sources increase, the formats increase. And analytics over different data formats, all at the same time, was a huge challenge, and continues to be a challenge.

The second thing I saw at Yahoo was delivering analytics over fresh data, i.e., extracting temporal value. For example, saying this data point occurred 15 minutes ago, what should I do? That was very difficult, because the only way to do analytics was to post a traditional Extract, Transform, Load (ETL) and that always takes time.

And the third big trend I saw at that time was that cloud was increasingly becoming the new paradigm of operations, and the next-gen data centers. The traditional databases were designed for appliances; they were designed for being delivered on customer appliances and not on heterogeneous virtualized environments.

TechRepublic: What are the main trends you are seeing in the data warehouse service space?

Dev Patel: There are three trends. The first is analytics over data from multiple sources -- new sources and traditional sources -- and joining those two. An example of a new source would be data from an application. And a traditional source would be transaction data, a CRM database, that sort of stuff.

The second trend is that analysts want to be able to access the data directly, without an additional layer of engineers having to transform or translate that data to make it consumable by an analyst.

And the third is that analysts want to be able to analyze that data almost at the velocity data is being generated. And you need to be able to do that over the data format in which the data is generated. So how do you analyze over JavaScript Object Notation (JSON)? Remember the language of analysts is SQL.

TechRepublic: Could you elaborate on those three trends?

Dev Patel: For that, we need to believe in big data, and specifically, we have to believe in certain core elements of big data.

Firstly, data is coming at different speeds, and data is a fast-growing area. And it's an area where there's a lot of data coming with every click on an application, depending on the success of the application, where every click by a user generates a new point of data. That data is coming continuously in real time. And that data tells a part of the company a lot.

That part of the company being product management, the designers, the people who understand the user experience, and people who want to derive future functionality based on what users are doing -- it tells them a lot. And they are learning from their users' engagement with the product.

The second group that is learning with this continuous stream of data are the people who make money within the application. So you may have application companies that put an offer out. For example, a games company says you can have a $10 credit for five dollars because you are a special customer. Or we haven't seen you active for a long time, and suddenly we are seeing you; hey, we would like you to maintain your level of engagement the same as two months ago, and we are to give you an offer to help you do that. So there are pockets of people who want to leverage this continuous data almost all the time.

With data coming continuously from multiple sources, a very astute product manager will take static data from the finance systems, where purchase history is maintained and where CRM data is maintained about their customers, and then join it with the streaming data to make new product offers.

A product manager might ask, hey, last Halloween, what happened? They can analyze long-term trending to identify seasonality effects, to identify possible effects on a promotion they are doing now [compared to prior offers]. All these are insights that are being generated now from combining new data streams with older, static data sources, and so on. That's a very significant trend I am seeing.

Also, what I am seeing is there is now a layer between insights folks who are analysts and their language of analysis, which has always been SQL, and business intelligence (BI) tools for reporting and dashboarding. And the typical BI tool would be Tableau.

There is a layer emerging between them and the data. And often the layer is created because new technologies have arisen, where the new technology is pretty much an engineer's framework. And that framework requires programming for making the data effective to be used by the analyst.

And that trend is damaging. Now we got to be looking for ways of breaking that trend, because suddenly a gulf has developed where you are slowed down by the engineer having to do development in order to refine or transform the data whereby it is usable for an analyst to consume.

And engineers are harder to find, difficult to acquire, and that extra layer is really not required. And people are now looking to get rid of that layer.

And then there is sort of the velocity of analytics, which is the third trend I am seeing. Extracting the temporal value of data is now becoming more critical. How do you design systems, where you can collect data very, very quickly, and start to provide hourly metrics, or hourly key performance indicators (KPIs), or even 15 minute KPIs? Analytics KPIs where you can understand what's going on in your systems.

As an example, you want to be able to analyze your data in the format that it is coming in. The new data format is significantly more and more JSON. We are seeing a lot of JSON as a data format being adopted by application developers.

And if you want to do data analytics almost at the speed that the data is coming in, then you can't afford to translate that data into a structured data format. You need to be able to do analysis directly over that data. And therefore, if you do that, then you will get the velocity in analytics that you require at an hourly rate or at a faster rate than that.