Data Management

TechRepublic Tutorial: Increase your SQL power with user-defined functions

Add to SQL Servers capabilities by adding your own functions


In SQL 2000, Microsoft introduced its implementation of user-defined functions (UDFs). You’re probably already familiar with many of the built-in functions such as GetDate(), User_Name(), and so on. You may not know that the master database contains a collection of UDFs as well, some of which are encrypted; others are fully available for your inspection. For a complete list of the UDFs, open Enterprise Manager and then expand the master database and, within it, the User-Defined Functions node.

In SQL 2000, you can write two kinds of UDFs—scalar functions and table functions. Scalar functions return a single value of any data type, much the same as user-defined functions in languages such as Visual Basic. By contrast, table functions return tables. I'll discuss both in this article. Before we begin our exploration, however, we must address two concepts: determinism and schema binding.

Determinism
Determinism means that the same set of inputs will always result in the same output(s). If a function satisfies this demand, it's called a deterministic function; otherwise, it's a non-deterministic function. The romantically inclined might be tempted to say that such a function has free will. This distinction applies to built-in functions as well as user-defined functions.

Functions such as DateAdd() are deterministic, since given the same set of parameters, they're guaranteed to return the same results. Functions such as GetDate() are non-deterministic, because while they're always called with the same parameters, they return different results every time. Regarding UDFs, the rules are a little more complicated, since their code could contain almost anything. A UDF is deterministic if:
  • The function is schema-bound (more on that in a moment).
  • All built-in or user-defined functions called by the user-defined function are deterministic. (For example, if you call GetDate() from your function, you inherit its non-determinism.)
  • The body of the function references no database objects outside the scope of the function. For example, a deterministic function cannot reference tables other than table variables that are local to the function.
  • The function does not call any extended stored procedures.

The schema-binding problem
Imagine a function, fn_Problematic_Select(), which selects * from one or more tables. Imagine that after writing this function, you insert a new column in said table(s). Subsequent invocations of fn_Problematic_Select() will not show your new column.

So the question becomes, should you ever select * from table(s)? In my opinion, the answer is emphatically no, but you may do otherwise. The way around this difficulty is to use schema binding, which binds the function to the objects it references, making it impossible to alter the referenced objects (tables, views, or other functions) unless either:
  • The function is dropped, or
  • The function is altered without using the schema-binding directive.

You cannot use the schema-binding option unless all of the following conditions are true:
  • Any views or user-defined functions referenced by the function are also schema-bound.
  • The objects referenced by the function are not referenced using a two-part name.
  • The function and the objects it references belong to the same database.
  • The user who created the function has the REFERENCES permission on all the database objects that the function references.

If any of these conditions do not hold true, the CREATE FUNCTION statement will fail.

What exactly can you do within a multi-statement function? Any of the following types of statement are permitted:
  • Assignment statements
  • Control-Flow statements
  • DECLARE statements defining local data variables
  • SELECT statements containing select lists with expressions that assign values to variables that are local to the function
  • Cursor operations referencing local cursors that are declared, opened, closed, and unallocated in the function. You can use FETCH statements to assign values to local variables using the INTO clause, but you cannot use FETCH statements that return data to the client.
  • INSERT, UPDATE, and DELETE statements modifying table variables local to the function. That is, you can create a local table and do anything you want to it, but you cannot change data in external tables.
  • EXECUTE statements that call extended stored procedures.

Now that we have the rules for creating and using functions, let’s explore some scalar UDFs.

Scalar functions
In the Access Northwind sample database, the Order Details table does not contain a column to house the extended amount. This is as it should be, the general principle being "do not store what can be calculated." However, when you need to calculate this value, you have to rewrite its formula over and over, and as you escalate (calculate the total extended amount for each order and then for each customer), the expression grows awkwardly.

Listing A addresses this problem with a function called fn_ExtendedAmount.

Note
The code in this article uses the Northwind sample database.

 

This enables us to write a query such as Listing B, which sums the extended amount for a given order.

Better yet, let’s turn it into a function. Listing C shows the function equivalent.

Moving up the relational hierarchy, we now create a function to derive the total purchases of a given customer. Listing D shows this function.

We can now call this function in a query such as that in Listing E.

This query results in a listing, as illustrated in Figure A.

Figure A
Results of the query produced in Listing E


Calling built-in UDFs
Calling a built-in UDF, the syntax is slightly different. The function name must be prefaced with a pair of colons. For example, if you try to call the function sp_helpcollations() in the normal way, you get the error “Invalid object name.” Instead, you must call it this way:
select * from ::fn_helpcollations()

Table functions
There are two types of table functions: in-line functions and standard table functions. For our purposes, we will focus on the more powerful in-line functions. In-line functions are a subset of table functions, with several distinguishing qualities:
  • The RETURNS statement contains only the keyword TABLE. You don’t have to specify the structure of the table since it derives from the SELECT statement.
  • There is no function body delimited by the BEGIN and END keywords.
  • The RETURN statement consists of a valid SELECT statement enclosed within parentheses.

In-line functions
As shown in Listing F, let’s begin with a very simple in-line function that accepts a prefix (the first few letters of the company name) returning all the customers whose names match the prefix.

To call this function for a specific prefix, you write:
SELECT *
FROM dbo.fn_CustomerList('A')
GO


The restriction that you can use only a SELECT statement on the RETURN line is not much of a restriction because the SELECT can be arbitrarily complex. Listing G shows a table function that builds on what we have already written. You can call scalar functions from table functions, simplifying your SQL even more.

To call this function for a specific year, write:
SELECT *
FROM dbo.fn_SalesByProduct()


Since table functions return tables, you can use them anywhere you can use a table. That is, you can join table functions to views, tables, and other table functions.

Extending the power of table UDFs with default parameters
As presented in Listing H, the function fn_SalesByProduct() provides a row for every product. This is occasionally useful, but more often we would want the total sales for a particular product. We could write another function or use a WHERE clause in our calling SQL, but there’s no need. We can revise the function slightly so that it can deal with both situations. Listing H shows this revision. It’s based on a clever trick: You don’t have to test a parameter against a column value; you can test it against itself. The changes to the code are miniscule—we default the parameter and then test it against itself.

Now, when we want the list of all products and their sales, we simply pass a zero as the parameter. Otherwise, we pass a specific ProductID and obtain its values. This trick applies to stored procedures as well. If you’re in the habit of writing two stored procedures rather than one to cover these situations or writing a single procedure containing an IF/ELSE construct that deals with both situations, this trick can simplify your T-SQL. The trick reduces to this construct:
WHERE
PK_Of_Interest = @PK
OR
@PK = 0


Subsetting data with table functions
With a little forethought, you can design table functions to consolidate your logic and provide different data to different users. Since a table function returns a table, you can select arbitrary columns from it. For example, you might have a function that returns EmployeeID, EmployeeName, DepartmentID, and Salary from your Employees table. User A can then select EmployeeID and EmployeeName from the function, while user B, who is entitled to see the salary column, can add that column to the select statement.

When to convert stored procedures to UDFs
Table functions can reduce your need for temporary tables, which is a good reason to learn how to use them. If you have any stored procedures that return result sets that you would like to join to other tables or views, they are excellent candidates for conversion. I don’t recommend wholesale conversion just for uniformity or “cool,” however. First of all, why fix what isn’t broken? On the other hand, if you can point to obvious benefits for converting, such as cleaner code, flexibility, and consolidation, then go ahead and convert.

If a stored procedure meets the following criteria, it is a good candidate for being rewritten as a table-valued function:
  • The stored procedure does not perform update operations (except to table variables).
  • You don’t build any dynamic SQL statements within the stored procedure.
  • The stored procedure returns only one result set.
  • The primary purpose of the stored procedure is to build intermediate results that are to be loaded into a temporary table, which is then queried in a SELECT statement.

 

Editor's Picks