By Edward Tittel

In a sense, it’s appropriate to describe Cocoon as an XML-based content management system because it provides a mechanism for describing the structure and semantics of Web information items (the content), the relationships between and among such items and how they change over time (their logic), and the ways in which they should be presented to viewers when requested (their style). In fact, the Cocoon project started as a part of the overall Apache Web server project as an effort to organize and control the documentation for the many projects that operate under the Apache umbrella.

As a metaphor for its real function, the choice of the name Cocoon is singularly apt: a surrounding structure within which something is transformed from the larval stage into something beautiful and ready for flight. The Apache Cocoon Web page doesn’t delve into the poetry behind the name, but its summation of the environment is entirely accurate: “Apache Cocoon is an XML publishing framework that raises the usage of XML and XSLT technologies for server applications to a new level. Designed for performance and scalability around pipelined SAX processing, Cocoon offers a flexible environment based on a separation of concerns between content, logic, and style.”

What exactly is Cocoon?
Cocoon began its existence as a simple Java servlet that used standard W3C components to do its job: the document object model (DOM) for parsing documents, XML for capturing and formatting data, XSLT for transforming data and merging or manipulating XML documents, and XSL to manage document presentation for Web delivery. A need soon arose to serve other kinds of content (e.g., programs as well as documents), and Cocoon has evolved into a complete XML-based publishing framework and system.

Over time, several new XML components were introduced, such as SAX and the subdivision on XSL into Transforms, Formatting Objects, and XPath functionality. These new standards resulted in the introduction of Cocoon 2 in 2002, which defines a standard (but still evolving) content management system in the public domain.

In this current incarnation, it’s reasonable to describe Cocoon in several different ways: as an XML publishing framework, a data source aggregator, and at its most basic level, a collection of pipelines and components.

Cocoon as a publishing framework
Cocoon is based on pipelined processing of SAX events, which provides scalability and good performance for Web applications built around its framework. A centralized configuration system supports the work involved in creating, deploying, and maintaining Web-based applications. It uses a caching system whereby components may be dynamically configured as needed. Incoming user requests result in checking the cache to see if the Universal Resource Indicator (URI) requested is present. If so, that content may be delivered without processing it through a pipeline.

Cocoon as data source aggregator
Cocoon functions as an abstract engine (through a Java servlet) based on customized protocol handlers that can access external data sources through a standard URI. Cocoon can even call itself recursively so that data streams can be processed in multiple pipeline stages quickly and efficiently.

Pipelines and components, oh my!
Modularity and abstract processing sit at the heart of the Cocoon architecture. Cocoon pipelines are conceptually similar to those used in UNIX systems, except that all elements in a Cocoon pipeline are SAX events created by parsing XML documents. Cocoon recognizes three types of pipeline elements, called components: generators, transformers, and serializers. Generators use a requested URI to produce SAX events; transformers consume SAX events and produce other SAX events in turn; serializers consume SAX events and produce some response in return.

What can Cocoon do for you?
At the barest minimum, a Cocoon pipeline to produce and deliver content in some recognizable form consists of a generator and a serializer. More typical Cocoon pipelines consist of generators that may be followed by one or more chains of transformers and serializers to produce different kinds of output. In this way the same source documents might, for example, be delivered in HTML format to a Web browser, in WML for a WAP device, or in PDF format for print output.

What makes Cocoon so interesting as a development platform is the vast list of generators, transformers, and serializers that have been developed and donated to the Cocoon 2 project. Once donated, these components can be customized to add new capabilities or extended to create new components. I’ve compiled a few of the widely available generators, transformers, and serializers available for Cocoon in Table A.
Table A



Converts directory listing into XML format from which SAX events are produced
Parses a file or URI and produces SAX events
Generates XML and SAX events from JSP pages
Generates XML and SAX events from XSP pages
Transforms SAX events using i18n dictionary and language parameter value
Processes xinclude namespace and includes external sources by adding SAX events into existing SAX stream
Transforms SAX event stream based on XSLT stylesheet definition
Produces HTML responses from SAX events
Produces a PDF from SAX events using Apache Formatting Output Processor (FOP)
Produces a JPEG from SVG SAX events using Apache Batik
Produces plain text output from SAX events, useful for non-XML text such as CSS or programming language code
Produces XML responses from SAX events

The short list of available components for Cocoon

As you can see, Cocoon’s capabilities extend beyond simply formatting data as HTML. In future articles, I will examine document creation and handling in the Cocoon environment and describe what’s involved in using this environment in more detail. For now, understanding Cocoon as a way to capture, render, and deliver Web-based content for a variety of purposes and consumers should be enough to pique your curiosity.