Download the new edition of Learning Spark from O’Reilly

Download Now
Provided by: Databricks
Topic: Big Data
Format: PDF

As the most active open-source project in the big data community, Apache SparkTM has become the de-facto standard for big data processing and analytics. Spark’s ease of use, versatility, and speed has changed the way that teams solve data problems — and that’s fostered an ecosystem of technologies around it, including Delta Lake for reliable data lakes, MLflow for the machine learning lifecycle, and Koalas for bringing the pandas API to Spark.

We’re proud to share the complete text of O’Reilly’s new Learning Spark, 2nd Edition with you. It includes the latest updates on new features from the Apache Spark 3.0 release, to help you:

  • Learn the Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets
  • Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI
  • Perform analytics on batch and streaming data using Structured Streaming
  • Build reliable data pipelines with open source Delta Lake and Spark
  • Develop machine learning pipelines with MLlib and productionize models using MLflow
  • Use Koalas, the open source pandas framework, and Spark for data transformation and feature engineering
Download Now

Find By Topic