Jaql: A Scripting Language for Large Scale Semistructured Data Analysis
In this paper, the authors describe Jaql, a declarative scripting language for analyzing large semi-structured datasets in parallel using Hadoop's MapReduce framework. Jaql is currently used in IBM's InfoSphere BigInsights and Cognos Consumer Insight products. Jaql's design features are: a flexible data model, reusability, varying levels of abstraction, and scalability. Jaql's data model is inspired by JSON and can be used to represent datasets that vary from flat, relational tables to collections of semi-structured documents. A Jaql script can start without any schema and evolve over time from a partial to a rigid schema.