Big Data

The top 10 big data frameworks used in the enterprise

Big data and machine learning programs are expanding across organizations, but finding tech talent to helm these initiatives is a challenge, according to a Qubole report.

Big data continues its rapid growth in the enterprise across all industries, according to a Tuesday report from Qubole and Dimensional Research. Organizations are using big data to drive IT projects, improve sales, and enhance customer service. And they're increasingly tapping big data frameworks to address the challenge of reaping the full value of the information gleaned in their business.

No single software framework dominates the big data landscape, the report found after surveying 401 data professionals with big data responsibility in large enterprises. However, 25% of organizations are using homegrown approaches for big data processing.

SEE: The Power of IoT and Big Data (Tech Pro Research)

No framework is ubiquitous, but there are a few standouts. Here are the top 10 big data frameworks, according to the report:

  1. Spark (31%)
  2. Hive (17%)
  3. HBase (17%)
  4. MapReduce (15%)
  5. Presto (13%)
  6. Kafka (13%)
  7. Impala (11%)
  8. Storm (11%)
  9. Flink (9%)
  10. Pig (6%)

While many of these numbers are in the same range, the report also noted that some increased in use over 2017, while others decreased. Spark, HBase, Presto, Kafka, Impala, Flink, and homegrown approaches all rose in popularity in 2018, while Hive, MapReduce, Storm, and Pig all dropped in usage.

Businesses may be prioritizing big data initiatives, but talent shortages remain a major issue, the report found. Three-quarters (75%) of respondents said they faced a headcount shortfall in engineers, scientists, and operators who can deliver big data value. While 79% of businesses said they want to increase their data team headcount in the next year, 83% said that it's very difficult finding data professionals with the necessary skills and experience.

Organizations experience several other challenges when it comes to big data, the report found. The most common big data roadblocks named were a lack of experience that slows progress (44%), keeping up with new data sources (42%), constantly evolving use cases (41%), too many manual tasks (38%), and the volume of data (34%).

As more companies look to implement machine learning programs across a wide range of use cases, strong big data practices become essential, the report noted. Top priorities for machine learning initiatives in the next year include improving data security and threat protection, optimizing the customer experience, and leveraging predictive maintenance, according to the report.

The big takeaways for tech leaders:

  • Spark, Hive, and HBase are the most popular big data software frameworks used in the enterprise. — Qubole, 2018
  • 75% of data professionals said they faced a headcount shortfall in engineers, scientists, and operators who can deliver big data value. — Qubole, 2018

Also see

istock-518142550.jpg
Image: iStockphoto/Ralwel

About Alison DeNisco Rayome

Alison DeNisco Rayome is a Staff Writer for TechRepublic. She covers CXO, cybersecurity, and the convergence of tech and the workplace.

Editor's Picks

Free Newsletters, In your Inbox