By Beth Bowden
Aggregate functions, those that act on a group of rows like max and min, are helpful to developers looking to simplify the logic required to perform a task. Beginning with Oracle 9i, developers have been able to create user-defined aggregate functions that can be used over again without editing a single line. Through powerful programming constructs, aggregate functions are actually fairly easy to create. Let's look at how it's done.
To see just how easy, take a look at a sample aggregate function, called toList. The toList function takes a set of records and groups the data contained in a specified field into a comma-delimited list. Consider the following query as an example:
select author, toList(title) titles
group by author;
The result of running the above code would look something like the list in Figure A.
|The toList function created a list of book titles.|
While I’ll be using Oracle’s PL/SQL language to create my toList function, you can use any Oracle supported language such as C++ or Java. All code can be executed in SQL*Plus, SQL*Spreadsheet or other Oracle SQL applications.
Creating an aggregate function involves five steps: Plan the logic for your function, create an object type header for the function, code the body of the object type, instantiate the object type, and finally, test your new function. I’ll now walk you through those steps in order.
Planning the logic
I’m a fan of formally planning out the routines I write, so I created the pseudocode in Listing A for toList. I’ll use that for my logic plan. Oracle actually handles quite a bit of the process for me, meaning I have to actually write the code for declaring the list variable, the if block that adds the commas, and the field name as part of the return value.
Create an object header
The object header defines persistent variables and declares the functions the object will contain. All objects must contain the functions ODCIAggregateInitialize, ODCIAggregateIterate, ODCIAggregateMerge, and ODCIAggregateTerminate, which, for reasons of readability, I will refer to as Initialize, Iterate, Terminate, and Merge, respectively. I’ll explain what these functions do when I talk about their code in the next step. Check out Listing B for the toList object header.
In Listing B, toListImp1 is the name of the object type, and list is the only persistent variable I’ve identified.
Code the member functions
Once you've created the header, you must code the actual functions. Listing C contains the function code for toList. Although the code listing is long, most of it is boilerplate that won't change much from one function to another.
Each function returns a number that indicates success or failure. ODCIConst.success represents success. Now that you've seen the code for them, it should be easy to understand what three of the four mandatory functions do:
- Initialize will execute once for each distinct value of the group by field. Naturally, I’ll need a new list for each distinct value, so I initialize the persistent variable, list, to null here.
- Iterate will execute for each row returned, so it will build the actual list of values. You can see that I test to see if list is empty, in which case I set it to the incoming value variable. If list is not empty, I append a comma and the incoming variable to it.
- Terminate executes once at the end of the function after all rows for a distinct value of our group by field have been processed. Since the only cleanup I need to do is taken care of in Initialize, I simply return the list variable.
The fourth function, Merge, is merely a stub that returns success.
Create the aggregate function
Once you've written all the code, the only thing left is to create the aggregate function using the toListImp1 type you just created by running the following command:
create or replace function toList (input STRING) return STRING
aggregate USING toListImp1;
Now for the fun part: testing. You can use the code in Listing D to create a sample books table and load some data into it for testing.
Enabling a function for parallel processing
Another power of aggregate functions is that you can use them to take advantage of Oracle’s parallel processing features. Parallel-enabling your aggregate function involves writing code for the Merge function and a slight change to the script used to the aggregate function. Note that the code I’ve given you so far is complete and will run successfully; adding parallelism is optional.
When an aggregate function is run on multiple processors, the Oracle 9i server can run the Iterate function on two or more processors. Each processor will return a separate context, and the Merge function exists to merge those contexts into a single result. The Oracle 9i Server will call the Merge function iteratively until all contexts are combined.
To enable toList for parallel processing, I first need to write the code for the Merge function, which you'll find in Listing E.
I also need to declare the aggregate function as being parallel enabled when I create it:
create or replace function toList (input STRING) return STRING
parallel_enable aggregate USING toListImp1;
And that’s it. The toList function can now take full advantage of multiple processors.
If you want to use your function but do not want to group by any field, group by a constant instead. For example:
Select toList(distinct Author) authors From books Group by ‘1’
You’ll want to bulletproof your aggregate functions just like you would any other program. For example, the code for my Merge function should probably check whether concatenating the two contexts would exceed the maximum number of characters for list (32,767).
If your Oracle server isn't configured for parallel processing, consider parallel-enabling your function anyway. This takes very little additional time to do and enhances the portability and reusability of your function in other processing environments.
Finally, Oracle 9i will allow you create some aggregate functions that simply will not run. For example, although Oracle allows you to create an aggregate function (i.e., using more than one function) that accepts more than one parameter, using that function will generate an error. Similarly, although you can code overloaded aggregate functions, they will fail when you try to use them.
The ability to create your own aggregate functions is a great way to extend Oracle with statistical or text manipulation routines that are not built in to Oracle itself. The possibilities are nearly endless, so you really should add this skill to your repertoire.