Now in general availability, the new analytics tool for use with the Delta Lake schema could make unwieldy data lakes a thing of the past.
Cloud data analytics firm Databricks announced the general release of Delta Engine, a data lake analytics tool it said is eight times faster than Apache Spark.
Delta Engine operates on data lakes built using the open source Delta Lake schema, which is built to make it possible to do ACID transactions on data lakes, something that was previously reserved for data warehouses said Databricks CEO Ali Ghodsi.
Delta Lake was also built by Databricks and released in 2017, before being donated to the Linux Foundation in 2019, and is now used by large organizations like Comcast, Nielsen, and Shell.
The concept behind Delta Lake was to drive more businesses to make use of "lakehouse" model, which Ghodsi said is a "best of both worlds" approach that merges the business analytics capabilities of data warehouses with the massive quantities of data stored in data lakes.
"The rise of artificial intelligence means we can now ask predictive questions of our data, but data warehouses don't support prediction and it's nearly impossible to make them do so," said Ghodsi.
SEE: Hiring Kit: Market research analyst (TechRepublic Premium)
The data lake, built to address those shortcomings, quickly become data swamps filled with data that can't be analyzed. "It's hard to ask structured questions of unstructured data," Ghodsi said.
Delta Lake is designed to curate data lakes so they can operate more like structured data and Delta Engine introduces a replacement for Apache Spark as an analytics tool, reportedly being able to perform queries up to eight times faster.
Unlike Spark, which was originally developed in 2009 by Databricks co-founder and CTO Matei Zaharia, Delta Engine is optimized for lakehouse-style data and is built for modern hardware that can perform tasks like single instruction multiple data (SIMD) instruction sets.
Along with the general release of Delta Engine, Databricks also announced its acquisition of the open source Redash project, a data analytics dashboard platform. Redash can be connected to Delta Engine to make visualizing data stored in lakehouses available to non-analytics professionals and external users.
Redash can be connected to any of Databricks' Unified Data Analytics Platform products with a connector application and will be fully integrated into Databricks in the coming months.
"Curated cloud data lakes provide organizations a way to run any kind of analytics, including data science and machine learning, on all their most recent data. Our introduction of Delta Engine and the acquisition of Redash are significant steps forward in helping organizations build these high quality, curated data lakes that some call 'Lakehouses,'" Ghodsi said.
- How to become a data scientist: A cheat sheet (TechRepublic)
- Big data management tips (free PDF) (TechRepublic download)
- Quick glossary: Backup solutions (TechRepublic Premium)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Best cloud services for small businesses (CNET)
- Big Data: More must-read coverage (TechRepublic on Flipboard)