Businesses have jumped on Big Data — but they quickly find themselves struggling in getting their Big Data initiatives off the ground. What 10 bases should you cover in your Big Data strategy to ensure a successful launch into this new IT area?
1: Understand your business objectives
This might sound simple, but there is always a tendency to jump on a new technology bandwagon (with a vendor telling you what you need to do), before you have thought through exactly what it is you expect to accomplish with Big Data analysis for the end business. Today, the companies getting the most out of Big Data already know how they want to harvest and mine it. If they are retailers, they might want to monitor social media and customer activities on their Web sites during a sale or promotion to see if they have targeted the right goods to be sold to the right consumers — and to adjust this on the fly if they need to. If they are manufacturers with supply chains, they might want to know from outside sources which transporters are the most reliable for shipments that must be delivered on time without fault. The sharper the focus of your Big Data business objective, the better results you'll achieve.
2: Clean your data first
Your Big Data analytics is only going to be as good as the data that goes into it. If you are burdened with incomplete and/or inaccurate data, fix it first. Data cleanups might be thankless projects, but it is important for the CIO to explain the necessity of cleanup to upper management.
3: Assess the readiness of your IT staff
Most companies don't have Big Data talent on staff. This is why many Big Data vendors and consultants report that companies are coming to them initially to jump-start Big Data projects. Therefore, the first step for CIOs should be to assess how ready their staffs are to take on Big Data before investments are made in Big Data technologies.
4: Revisit your IT infrastructure
Hand in hand with assessing your staff's readiness for Big Data is assessing your IT infrastructure for the inclusion of Big Data. Big data uses parallel processing and streams large chunks of unstructured and semi-structured data through analytical algorithms. Traditional servers focus on speedy transaction processing against data of fixed and smaller record lengths. Between these two extremes is "old style" analytical reporting that runs in batch and feeds off data warehouses to produce daily, weekly, and monthly reports. Because Big Data processing is markedly different from transactional and batch reporting, you will likely need to make changes in your IT infrastructure to include new business rules, metrics, and monitoring tools for Big Data applications.
5: Set both short- and long-term strategies
As a followup to points three and four, CIOs should set strategies for Big Data that take into account that staff and equipment might be at early stages of readiness when Big Data is first deployed, gaining maturity as time goes on. Many companies approach this by using Big Data consultants when they initially deploy Big Data — concurrently training (or recruiting) for staff so that at a given point in time, staff can take over.
On the Big Data equipment side, there might also be initial moves to outsource Big Data processing to a cloud provider. This gives the company the opportunity to look at initial Big Data processing use and to project use - and determine whether it makes ultimate economic sense to bring HPC (high performance computing) resources for Big Data in-house.
6: Set the right KPIs for Big Data
Don't try to impose the same key performance indicators (KPIs) that you use for transaction processing on Big Data. Transactions (and transaction servers) are judged on throughput and speed of transactions. Big Data (and Big Data servers) should be judged on how many jobs are being parallel-processed and are getting completed in the most expeditious manner. This means that one large Big Data job might be running for a number of hours, during which a number of smaller Big Data jobs have also run and completed.
7: Spend time on your storage archiving methodology
Companies dive into Big Data processing projects, eager for analytics that can give the company competitive advantage. Unfortunately, they may not devote the same amount of time to architecting their strategies for Big Data archiving and access. If for no other reason than historical trending, business executives will want to go back and review data — and potentially query it with new sets of analytics. This is why it is important to tier your storage and archiving for best advantage. Data that is likely to be asked for (and that is historical) should be on storage media that can be quickly accessed on an as-needed basis. Data that is less likely to be requested should be on cheaper, slower storage media. Data that will never be used again should be purged. None of this storage archiving plan can be built without IT sitting down with end business decision makers to determine which Big Data is going to remain important for the business over the longer haul.
8: Begin with a small and manageable project
A Big Data project is no different from any other IT project. If the technology is new, start with a very small pilot project at first. This allows you to get your feet wet with the technology and determine what you will need to do on the IT side to support it. It also allows end users to see the analytical capabilities that Big Data presents. Just as essential, a small project gives IT and end users the opportunity to try out best ways in which they can collaborate with each other for great analytics results.
9: Ask the right questions
Knowing how to query Big Data for greatest advantage is one of the biggest challenges that enterprises face. You can understand that consumers in the Midwest who are under 30 years of age are most likely to buy tickets to sporting events on snowy days in February only if you have correlated enough factors and come up with the right question to drive those answers. It isn't easy — and it's going to take time for companies to acquire or develop these skills.
10: Don't try to make do with old servers
In an effort to be cost-effective, some organizations try to repurpose x86 servers that they no longer need for transaction processing and convert them into Big Data processors. With its parallel processing requirements, Big Data is an entirely different animal — even though it often runs on x86 servers. If you're moving into Big Data applications, purchasing one or several specialized Big Data servers should be in your budget.
- The 21st Century Data Center (ZDNet special report page)
- Executive Guide: The 21st century data center (free ebook)
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.