Big data revelations depend upon how well data from different incoming sources is blended.
One reason why big data blending is so important is that blending enables uncommon types of data to coexist in a common data repository. Finding ways to bring together unusual data feeds facilitates new data combinations that were historically impossible to get to when this data was stored separately. This is also why one of the primary tasks of data analysts is to determine the “blends” of the data that must be achieved in much the same way that a winemaster determines the blends of grapes to yield a certain wine.
Unfortunately, when you’re working with Hadoop and other incoming sources of machine- and web-based data, data blending can become a highly complex process that involves millions of rows of data. When the process becomes too complicated, people don’t use it. This is where data begins to lose value.
“One site struggled for months trying to blend Google Analytics with corporate data, and just couldn’t do it,” said Jaime Merritt, Easyl vice president of product marketing for Progress Software. In a proof-of-concept project, the same staff attempted this with data blending tools and accomplished the blend in two hours. In another case, data analysts were able to reduce a one-day job of data blending into just half a day. Both sites were using Progress Software’s Easyl, which simplifies data preparation.
“There are two approaches when it comes to preparing big data for analytics,” said Merritt. “The first approach is building a data warehouse, which is defined and designed by business users and IT. This data warehouse is usually built from system of record and transactional data. The data is also cleaned and checked for quality with an ETL (extract, transform, load) process before it is blended. The second approach is what we focus on. This is a self-service data preparation approach that is especially designed for business users who have a need to prepare and query big data without support from IT. They can pull in data from different sources and work with data organization in formats that are already familiar to them.”
The self-service process consists of making a “rectangle” of the data that you want to blend. It works like this:
A big data blending tool like Easyl puts data from different sources into a series of rectangles, and then the end business user creates his own blended rectangle by selecting the data fields from each of these source rectangles that he wants to include. The user can also define certain data matches that he wants blended into a single data element (e.g., make Joe Wilson and Joseph Wilson the same person).
These types of facile data blending tools empower functions like marketing, which is highly active in big data blending and queries, to be self-sufficient in data analyses. Users can even automate their resulting data analytics processes to refresh themselves at regular intervals so that data stays current.
“A person running marketing or sales needs to be empowered to know what they’re looking at,” said Merritt. “Because no matter how many features or algorithms you employ with this data, you will only get your data analysts querying this data if it is easy to use.”
Note: TechRepublic and ZDNet are CBS Interactive properties.