Hadoop adoption is up, and TDWI says it will be a common enterprise tool within five years. These best practices help organizations get the most out of Hadoop.
A best practices survey from TDWI reports a considerable increase in how many enterprises plan to have Hadoop clusters in production. By Q1 2016, 60% of survey respondents will be in production, up from 16% when the report was published earlier this year. Further strengthening Hadoop's future as an enterprise tool, only 6% of organizations have ruled out Hadoop, down from 27% in 2012.
At these rates, TDWI predicts that Hadoop will be a "majority practice" within five years. As a focused summary of its Hadoop for the Enterprise report, TDWI provides a list of 10 priorities (which they call recommendations, requirements, or rules) that can help organizations new to Hadoop derive the most benefit from it.
Open source Apache Hadoop is a "framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models." In other words, it's a big data, data warehousing, and big data analytics tool.
Located in the Seattle area, TDWI provides business education and research about "all things data." For the Hadoop 2014 survey, TDWI conducted telephone surveys with 247 respondents beginning November 2014.
TDWI's top 10 priorities for enterprises using Hadoop
1: Be open to Hadoop and other new options.
Business should embrace the use of open source, new forms of analytics, data structures, and sources, and new enterprise methods for leveraging big data. Writes TDWI: "You can embrace and guide change so it leads to improvements, or you can maintain the status quo as opportunities pass by."
2: Innovate with big data on enterprise Hadoop.
89% of the TDWI survey respondents view Hadoop as an opportunity for innovation. The report's author recommend using Hadoop to expand data samples for data mining and statistical analysis, using social data for more complete customer views, and leveraging the low cost of Hadoop to innovate enterprise approaches to budgeting, infrastructure provisioning, and funding.
3: Base Hadoop adoption on business and tech requirements.
According to TDWI, any one of the main benefits of Hadoop -- advanced analytics, big data leverage, data exploration, extending older data management platforms, archiving, cost containment -- is "compelling enough" to give it serious consideration. They add that if "your organization has all these requirements, they will lead you into the broad enterprise use of Hadoop outlined in the report."
4: Know the hurdles so you can leap over them.
The obstacles described in the report include: weak business support, security issues, and excessive hand coding. "Never let these stop you," writes TDWI. Survey respondents have solutions for all of these, and the ongoing development of the Hadoop ecosystem is working to decrease such hurdles.
5: Get training (and maybe new staff) for Hadoop and big data management.
Companies should focus on training and hiring data specialists -- data analysts, data scientists, and data architects -- people who can develop the applications for data exploration, analytics, archiving, and content management. The report's author write: "When in doubt, hire and train data specialists, not application specialists, to manage big data." When TDWI asked how respondents are staffing for Hadoop, 73% are training current employees; 41% are hiring new employees with relevant experience; and 36% are using consultants.
6: Co-opt Hadoop to rethink the economics of data and content architectures.
Report interviewees describe how they have developed multiple platform types in their environments, each platform being a best of breed for specific workloads and user needs. This has also led to a new costing model where enterprise IT can direct data and processing to the least expensive platform for the work needed. "The low cost of Hadoop is the leading driver behind this enterprise-wide change in IT portfolios and architectures," writes TDWI.
7: Prepare for hybrid data ecosystems by defining places for Hadoop in your architecture.
TDWI suggests uses cases where Hadoop's "enterprise-scope value" will become more understandable to new users. Benefical "starter" use cases include: staging in a data warehouse environment, using Hadoop as a co-location point for large datasets to promote broad data exploration, processing data for advanced analytics, as a replacement for archaic archives, and as an extension for content management systems.
8: Consider Hadoop use cases outside the usual BI/DW and analytic applications.
BI/DW stands for business intelligence and data warehousing. "Archival and backup systems are outdated and ineffective in most firms," writes the author. Hadoop's low costs and scalability make it attractive for that use case. Other uses cases according to survey respondents include: content management, document management, and records management.
9: Look for capabilities that make Hadoop data look relational.
Relational functions, including SQL-based analytics, are essential to enterprise adoption of Hadoop since high-profile use cases require them. A number of vendors and open source organizations are developing better SQL support for Hadoop, and the report's author emphasizes that these improvements do not detract from Hadoop's "unique capabilities" as a NoSQL platform. "Part of the power of Hadoop is its ability to support many approaches to many types of data," writes TDWI, adding that "Hadoop gets more diverse almost daily" in that way.
10: Develop and apply a strategy for enterprise Hadoop.
Last but not least, organizations new to Hadoop should start a proof of concept (POC) project that would evaluate the business value of multiple use cases. Starting points might include amassing big data for exploration, discovery, and specific forms of analytics. The POC team can also test data warehouse extensions, archiving, content management, and storage provisioning. The ultimate goal of the POC project, writes the author, is "creating a prioritized list of Hadoop-based applications that will eventually stretch across the whole enterprise."
- 2nd generation Hadoop: A platform for your most critical cloud applications
- The secret ingredients for making Hadoop an enterprise tool
- Hadoop reporting gets one step closer to self-service big data
- Hadoop's a matter of innovation, not cost
- Find out why the Hadoop future is going to take a long time