Data Management

Understand when to use user-defined functions in SQL Server

In the simplest terms, a user-defined function (UDF) in SQL Server is a programming construct that accepts parameters, does work that typically makes use of the accepted parameters, and returns a type of result. This article will cover two types of UDFs: table-valued and scalar-valued.

In the simplest terms, a user-defined function (UDF) in SQL Server is a programming construct that accepts parameters, does work that typically makes use of the accepted parameters, and returns a type of result. This article will cover two types of UDFs: table-valued and scalar-valued. (I will not be covering aggregate functions.)

Types of UDFs

Table-valued functions

A table-valued UDF is a function that accepts parameters and returns the results in the form of a table. This type of function is special because it returns a table that you can query the results of and join with other tables. In SQL Server 2005, field values from other tables may be passed into the function during a join operation to return a record based result. To accomplish this, you must use SQL Server 2005's APPLY operator.

It can be difficult to know when it is appropriate to use a VIEW vs. when it is appropriate to use a table-valued UDF. VIEWs are a great tool for data abstraction, combining data, and logically using subsets of data. I like to use table-valued UDFs when I need to use one or more values from different tables in a join operation where some type of calculation needs to be done and an aggregation returned.

Scalar-valued functions

A scalar-valued UDF accepts parameters and, ultimately, returns a single, atomic value. There are seven reasons why these types of functions are different than stored procedures in the database engine.

  • You cannot modify data inside of a UDF.
  • A scalar-valued UDF returns only one value, where a stored procedure can have numerous OUTPUT parameters.
  • You can use scalar-valued UDFs as the default value for a column in a table.
  • Scalar-valued UDFs are an easy way to define constant values to use in your database environment.
  • You can pass field values as parameters into UDFs.
  • You can nest scalar function calls. This means that you can pass a call to a scalar-valued function to another function or stored procedure.
  • You can use the results from scalar-valued UDFs as criteria in a WHERE statement. Although you can do it, this is typically not a good idea. (Later in the article, I'll explain why I try to avoid this common practice.)

There are two types of scalar-valued UDFs: deterministic and non-deterministic. Recognizing the determinism of the functions that are created is important. An example of the importance is the creation of indexed views. One of the many restrictions of creating an index on a view is that the view definition cannot use a non-deterministic function.

Deterministic

A deterministic UDF always returns the same result with the same set of input parameters. Some examples of deterministic functions are the system functions MONTH(), YEAR(), and ISNULL().

Non-deterministic

A non-deterministic UDF is can potentially return a different value each time it is called with the same set of input parameters. Some examples of non-deterministic functions are the system functions GETDATE(), NEWID(), and @@CONNECTIONS.

Two examples of UDFs

Before presenting the examples, I will set up my SalesHistory table and load data into it:

IF OBJECT_ID('SalesHistory')>0   

      DROP TABLE SalesHistory;

CREATE TABLE [dbo].[SalesHistory]

(         

      [SaleID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,         

      [Product] [varchar](10) NULL,               

      [SaleDate] [datetime] NULL,               

      [SalePrice] [money] NULL

)

DECLARE @i SMALLINT

SET @i = 1         

WHILE (@i <=1000)

BEGIN                        

      INSERT INTO SalesHistory(Product, SaleDate, SalePrice)                     

      VALUES ('Computer', DATEADD(mm, @i, '3/11/1919'), DATEPART(ms, GETDATE()) + (@i + 57))               

      INSERT INTO SalesHistory(Product, SaleDate, SalePrice)         

      VALUES('BigScreen', DATEADD(mm, @i, '3/11/1927'), DATEPART(ms, GETDATE()) + (@i + 13))                    

      INSERT INTO SalesHistory(Product, SaleDate, SalePrice)             

      VALUES('PoolTable', DATEADD(mm, @i, '3/11/1908'), DATEPART(ms, GETDATE()) + (@i + 29))                           

      SET @i = @i + 1

END

GO

The first UDF I will look at is the scalar-valued UDF. The script below defines a function named dbo.udf_GetProductSales that accepts three parameters and returns a MONEY value. The function uses the three input parameters as criteria in calculating the total sales from the SalesHistory table.

CREATE FUNCTION dbo.udf_GetProductSales

(

      @Product VARCHAR(10),

      @BeginDate DATETIME,

      @EndDate DATETIME

)

RETURNS MONEY

AS

BEGIN

      DECLARE @Sales MONEY

     

      SELECT @Sales = SUM(SalePrice)

      FROM SalesHistory

      WHERE

            Product = @Product AND 

SaleDate BETWEEN @BeginDate AND @EndDate

      RETURN(@Sales)

END
The script below calls the UDF created in the above script. Note: The schema the function belongs to must be used in the call. In this case, the function belongs to the dbo schema.
SELECT dbo.udf_GetProductSales('PoolTable', '1/1/1990', '1/1/2000')

I usually discourage using scalar-valued UDFs in a WHERE criteria statement because, for every record considered in the query, the scalar-valued function will be called. This means that a function used in the WHERE criteria will cause a scan of the values being searched, which is going to be slower than if an index is able to be used. (I will provide more details on this concept in a future article.)

Although the use of a correlated sub-query is sometimes confusing and complicated, the use of them can help solve some of the more challenging query problems. While using these special queries is useful, they only return one column of data. You can use the upgraded table-valued UDFs in SQL Server 2005 to overcome this shortcoming. I'll show you how to use the APPLY operator to accept column values from a table and return a table-result of correlated values.

CREATE FUNCTION dbo.udf_GetProductSalesTable

(

      @Product VARCHAR(10),

      @SaleID INT

)

RETURNS @SalesTable TABLE

(

      SalesTotal MONEY,

      SalesCount INT

)

BEGIN

     

      INSERT INTO @SalesTable(SalesTotal, SalesCount)

      SELECT

            SUM(SalePrice), COUNT(SaleID)

      FROM

            SalesHistory

      WHERE

            Product = @Product AND

            SaleID <= @SaleID

      RETURN

END

GO

The above function accepts the particular product for which we were searching, along with the SaleID from the SalesHistory table. From the function definition, you can see that the function returns a table named @SalesTable that contains two columns: SalesTotal and SalesCount. The body of the function inserts aggregate values into the @SalesTable table variable based upon the input parameters.

The following code uses the APPLY operator to invoke the table-valued function with the values from the SalesHistory table. (Note: Logically, you may want to use a JOIN operator here, but it is not necessary. The APPLY operator essentially does the "JOIN" for us by applying the values from the SalesHistory table to the table-valued function. In a sense, this code works the same way a correlated sub-query does, except that it can return multiple correlated values.)
SELECT * FROM SalesHistory sh

CROSS APPLY dbo.udf_GetProductSalesTable(sh.Product, sh.SaleID)

ORDER BY sh.SaleID ASC

Additional TechRepublic resources on UDFs

Tim Chapman is a SQL Server database administrator who works for a bank in Louisville, KY, and has more than 7 years of IT experience. He is also Microsoft certified in SQL Server 2000 and SQL Server 2005. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.

-----------------------------------------------------------------------------------------

Get SQL tips in your inbox

TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

7 comments
pixelwiz
pixelwiz

Can you please give some more details on when it is more appropriate to use a View vs a table-valued UDF? I have a few table-valued UDFs created that now I feel maybe should have been views, but I can't figure out when it's better to use one vs the other.

unclebiguns
unclebiguns

So you are saying that if I use the a Scalar UDF ona variable or parameter in a where clause, it will not use an index? For example: Select product_id, product_name from products where product_group_id = dbo.GetProductGroupIdFromName(@product_group_name) will not use an index on product_group_id but if I did this: Select @product_group_id = dbo.GetProductGroupIdFromName(@product_group_name) Select product_id, product_name from products where product_group_id = @product_group_id) will use the index on product_group_id?

alaniane
alaniane

Thanks for the article on UDFs. I also like to use UDFs when designing queries. It makes it easier to break complex queries into smaller chunks for debugging purposes. Later on, I will replace the UDFs if I need to improve the performance of the query. I have discovered that sometimes the UDF actually performs better.

gkanch
gkanch

Thanks for this great article !! I really like the advise you gave -- "I usually discourage using scalar-valued UDFs in a WHERE criteria statement because...". It explains poor performance of my SQL Procedures I have done in the past. There are some details omited that leaves me wondering what exactly what you mean. For example, "I like to use table-valued UDFs when I need to use one or more values from different tables in a join operation where some type of calculation needs to be done and an aggregation returned." makes me wonder if one value is returned, then why do we need a table ? An example will be nice. Thanks again for this insightful article!

thisisfutile
thisisfutile

Good information. You said, "I usually discourage using scalar-valued UDFs in a WHERE criteria statement" and that's exactly some of the things a developing database manager, like myself, needs to know. It's one of those things I find myself wondering about, but I just can't make the time to seek out the information. Reading an article like this over my morning "wake-up" cup of coffee is where I love to pick up these professional opinions. In fact, you stated that you will go into detail in a later article about this very topic and I look forward to reading that one. Thanks again, Gabe

chapman.tim
chapman.tim

Thank you. I'm glad you found it helpful. I may not have worded that statement as clear as I could have. An example of what I was talking is the table valued UDF at the end of the article. In that example, I am using table field values (as it turns out i am using values from one table, but could have used any number of tables) and passing them into a table function that returns aggregated values based upon those values passed into the function. Does that make sense?

chapman.tim
chapman.tim

I'm glad to share this kind of information. Great to hear that it helped.