Big Data

5 steps to extracting big data gold

Big data success depends on extracting the right information. If your company is struggling to pinpoint and locate that information, follow this step-by-step guide on how to mine for big data gold.

As Greek mythology tells it, Jason was challenged to go past the end of the known world to retrieve the golden fleece; battling dangerous seas, dragons, and warriors.

Corporate efforts to find the gold in data aren't nearly as strenuous, but they aren't necessarily easy, either.

"Years after big data projects were started, we still have companies tell us that they got pulled into big data, but didn't really know what they should looking for," said Roy Johnson, chief data scientist at SPR, a consultancy specializing in digital transformation and enterprise computing. "We ask them why they started these projects and they tell us, because it was a trend that they felt they should be following."

Johnson said it's hard for firms to succeed with their big data projects when they really aren't sure of the business gold they're seeking. "The companies somewhat understand the kind of data that they're looking for, but they often underestimate how difficult it will be to 'mine' the gold from that data because they don't have the skillsets," said Johnson.

SEE: Big data policy (Tech Pro Research)

My own work with companies has shown that many have done quite well at finding big data gold — but for those that are still striving, Johnson recommended an excellent five-step approach.

1. Start with your traditional relational database data

This is the data that is stored in columns and rows in SQL or other relational databases and can easily be queried by users.

If you are in sales, you can start looking at different products, see how many of the products are sold where and to whom, how many of the products are returned, what your inventory levels are, and so on.

From this data alone, there are many relationships that can be made between sales, inventory levels, customer locations, service records, etc.

"Because there is so much data that is sales related, sales is an easy area for business users to get started in," said Johnson. "It is also an area where it is very easy to add on big data that can improve the depth of your queries so you really can find the elusive gold you're looking for."

2. Add big data to your existing relational database queries

Once a company understands its relational database sales data, there are bound to be new questions that surface.

"For instance, a company might see sales spike during periods of time that they have no explanation for," said Johnson. "These sales spikes are an anomaly, so the company decides to add some big data to its relational data to try to make sense of what is happening. One of the big data choices it makes is to bring in weather information, which could come in as an XML data stream. From this, the company discovers that sales tend to spike during days when the weather is cloudy, which perhaps drives people into activities like shopping."

3. Incrementally add more big data to your queries

With the big data added to traditional sales query data, the company has now stepped into the domain of big data. From here, it is easy to append more types of big data. A logical next step for sales reporting might be to add Twitter and Facebook comments that customers and others are making about your products. "It is easy to add to your big data sources once you start asking questions about your sales and realize how certain types of data can help you understand your business better," said Johnson.

SEE: Turning big data into business insights (free PDF) (ZDNet/TechRepublic special report)

4. Incrementally train your staff

There are many companies that lack the skillsets needed for data scientists and big data analysts. This is what makes the approach of starting with your relational database data and then doing a gradual buildout into adding different types of big data so appealing. You can grow your staff knowledge about big data incrementally.

"There are tools and also consultants out there who can assist you as needed, but when your staff is starting from a relational database foundation that they already understand quite well, it is not such a big leap for them to begin working with big data that they append and build out from this base," said Johnson.

5. Consider a hybrid reporting environment for your data

Once you begin to append big data to your relational database queries, you will need to define another data repository for this data. Unstructured big data can't reside in a relational database. What you will need do is to define a big data database such as Hadoop HFDS, and then move the combination of traditional and big data into this big data database.

"The good news is that you don't necessarily have to incur a large capital expense to bring in new servers and storage for this work," said Johnson. "There are many cloud vendors that can host the data in a Hadoop or other big data database for you. They can also manage this data."

Final takeaway:

The best news of all for companies that are still struggling to get business meaning from their big data is that there is a step by step way that they can move their businesses and their IT staffs into productive big data projects by starting them from a traditional database and reporting foundation that everyone is already familiar with.

This eases anxiety among business users and IT staff because they can start with what they know. It also reduces the risk of failure as you move into more ambitious big data projects.

Also see:

istock-850852928.jpg
Image: iStock/NicoElNino

About Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...

Editor's Picks

Free Newsletters, In your Inbox