You are probably familiar with the built-in sorting abilities of SQL Server for columnar data, but what if you need to sort character strings stored in SQL Server fields? Even though this may not be a common need for most users, this issue may come up from time to time. I’ll demonstrate how to write a common sorting algorithm using SQL Server TSQL code.
A sorting algorithm is a routine that sorts a set of data, typically stored in an array, a set, or other data structure. There are many types of sorting algorithms — some of which are very fast, sophisticated, and complex, and some are pretty basic. It’s best to write the more complex algorithms in iterative languages, such as C++, where it is easy to define data structures to hold the data, and the constructs to sort are readily available. You can write the more simple algorithms in TSQL, although some iteration is required.
In this example, I will implement a variation of the Bubble Sort sorting algorithm, which is one of the easiest sorting algorithms to implement because of its “brute force” nature. It loops through the characters, checking each one against the one next to it until it has determined that the strings (or numbers) are in sorted order. Because of how it’s implemented, Bubble Sort is one of the worst performing sorting algorithms, and it is only used for sorting small sets of data. My example will contain small sets, so this algorithm will be fine, although I wouldn’t run it on a field in a table with millions of records.
In “algorithm terms” Bubble Sort is O(n2), which means the maximum number of operations to sort the set is the number of items in the set squared. For my example, this number will be a little bit worse because of the manner in which I have to do some of the swapping in TSQL. If this algorithm was implemented in C++, you could write it in such a way as to performing at O(n2).
Before I get to the sorting function, I need to write a helper function to assist me with the sorting. This function will do the work of swapping two characters based on specific positions in the character string.
If I have the string called elephant, and I need to swap the second and fourth characters of the string, my swap function will produce epelhant. It may not make much sense to do this right now, but it is vital to the way in which our Bubble Sort algorithm works. (The example in this article applies to SQL Server 2000 and later.)
CREATE FUNCTION dbo.udf_Swap
DECLARE @returnval varchar(1000)
DECLARE @begin VARCHAR(1000), @middle VARCHAR(1000), @end VARCHAR(1000)
DECLARE @firstchar CHAR(1), @secondchar CHAR(1), @len INT
SET @fullstring = LTRIM(RTRIM(@fullstring))
SET @len = LEN(@fullstring)
IF @charlocation1 > @len OR @charlocation2 > @len
SET @returnval = @fullstring
SET @firstchar = SUBSTRING(@fullstring, @charlocation1, 1)
SET @secondchar = SUBSTRING(@fullstring, @charlocation2, 1)
SET @begin = LEFT(@fullstring, (@charlocation1-1))
SET @middle = SUBSTRING(@fullstring, @charlocation1+1, (@charlocation2-@charlocation1)-1)
SET @end = SUBSTRING(@fullstring, @charlocation2+1, @len)
SET @returnval = @begin + @secondchar + @middle + @firstchar + @end
Now that I have my helper function, I can write the sorting function. This function, udf_SortString, accepts a string of characters and returns them in sorted order. If there are numbers, they will be returned first, followed by any alpha characters. If the string has one or more words, the words will be sorted individually.
There are two loops in this function that do the sorting work. The first loop’s job is to loop until characters have been swapped. Once a set of characters have been swapped, it is time to start at the beginning and do another scan. This loop will end once no characters have been swapped, and the string is in sorted order.
The second loop examines each character in the string and compares it to the next character in the string. If the characters are in sorted order, it moves to the next character. If the characters are not sorted, it uses the function I created earlier and swaps them. Then the loop starts over, and the characters are examined again and again until no more swapping occurs. Once a pass is made and there is no swapping, the string is in sorted order.
CREATE FUNCTION udf_SortString
DECLARE @len TINYINT
DECLARE @i TINYINT
DECLARE @currentchar CHAR(1)
DECLARE @swapped BIT
DECLARE @begin BIT
DECLARE @nextchar CHAR(1)
SET @begin = 1
SET @len = LEN(@string)
SET @i = 1
WHILE @begin = 1 OR @swapped = 1
SET @swapped = 0
SET @i = 1
SET @begin = 0
WHILE @i <= @len
SET @currentchar = SUBSTRING(@string, @i, 1)
SET @nextchar = SUBSTRING(@string, @i + 1, 1)
IF @currentchar > @nextchar AND (@nextchar > '')
SET @string = dbo.udf_swap(@string, @i, @i + 1)
SET @swapped = 1
SET @i = @i + 1
All I need to do is call my function to sort a string:
Sort it out
Most database users don’t think about how much work goes into sorting data and searching for data because the database provides the functionality. By understanding how this works, it gives me a greater appreciation of the functionality.
While you may not apply the information you learned in this example for most users, I wanted to show how you can write your own string sorting functions in TSQL if the need arises. Also, I wanted to provide an overviewhope you learned about sorting algorithms if you weren’t already aware of them.
Tim Chapman a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at email@example.com.
Get database tips in your inbox
TechRepublic’s free Database Management newsletter, delivered each Tuesday, contains hands-on SQL Server and Oracle tips and resources. Automatically subscribe today!