Spark SQL: Relational Data Processing in Spark
Spark SQL (Structured Query Language) is a new module in Apache Spark that integrates relational processing with Spark's functional programming API (Application Program Interface). Built on their experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g., declarative queries and optimized storage) and lets SQL users call complex analytics libraries in Spark (e.g., machine learning). Compared to previous systems, Spark SQL makes two main additions. First, it offers much tighter integration between relational and procedural processing, through a declarative data frame API that integrates with procedural Spark code.