Software Development

Sorting character strings using SQL Server

If you ever need to sort character strings stored in SQL Server fields, check out this demonstration of how to write a common sorting algorithm using SQL Server TSQL code.

You are probably familiar with the built-in sorting abilities of SQL Server for columnar data, but what if you need to sort character strings stored in SQL Server fields? Even though this may not be a common need for most users, this issue may come up from time to time. I'll demonstrate how to write a common sorting algorithm using SQL Server TSQL code.

Sorting algorithms

A sorting algorithm is a routine that sorts a set of data, typically stored in an array, a set, or other data structure. There are many types of sorting algorithms -- some of which are very fast, sophisticated, and complex, and some are pretty basic. It's best to write the more complex algorithms in iterative languages, such as C++, where it is easy to define data structures to hold the data, and the constructs to sort are readily available. You can write the more simple algorithms in TSQL, although some iteration is required.

In this example, I will implement a variation of the Bubble Sort sorting algorithm, which is one of the easiest sorting algorithms to implement because of its "brute force" nature. It loops through the characters, checking each one against the one next to it until it has determined that the strings (or numbers) are in sorted order. Because of how it's implemented, Bubble Sort is one of the worst performing sorting algorithms, and it is only used for sorting small sets of data. My example will contain small sets, so this algorithm will be fine, although I wouldn't run it on a field in a table with millions of records.

In "algorithm terms" Bubble Sort is O(n2), which means the maximum number of operations to sort the set is the number of items in the set squared. For my example, this number will be a little bit worse because of the manner in which I have to do some of the swapping in TSQL. If this algorithm was implemented in C++, you could write it in such a way as to performing at O(n2).

The example

Before I get to the sorting function, I need to write a helper function to assist me with the sorting. This function will do the work of swapping two characters based on specific positions in the character string.

If I have the string called elephant, and I need to swap the second and fourth characters of the string, my swap function will produce epelhant. It may not make much sense to do this right now, but it is vital to the way in which our Bubble Sort algorithm works. (The example in this article applies to SQL Server 2000 and later.)

CREATE FUNCTION dbo.udf_Swap

(

    @fullstring VARCHAR(1000),

    @charlocation1 TINYINT,

    @charlocation2 TINYINT

)

RETURNS VARCHAR(1000)

AS

BEGIN

        DECLARE @returnval varchar(1000)

        DECLARE @begin VARCHAR(1000), @middle VARCHAR(1000), @end VARCHAR(1000)

        DECLARE @firstchar CHAR(1), @secondchar CHAR(1), @len INT

        SET @fullstring = LTRIM(RTRIM(@fullstring))

        SET @len = LEN(@fullstring)

    IF @charlocation1 > @len OR @charlocation2 > @len

        SET @returnval = @fullstring

        ELSE

        BEGIN

               SET @firstchar = SUBSTRING(@fullstring, @charlocation1, 1)

               SET @secondchar = SUBSTRING(@fullstring, @charlocation2, 1)

               SET @begin = LEFT(@fullstring, (@charlocation1-1))

            SET @middle = SUBSTRING(@fullstring, @charlocation1+1, (@charlocation2-@charlocation1)-1)

               SET @end = SUBSTRING(@fullstring, @charlocation2+1, @len)

               SET @returnval = @begin + @secondchar + @middle + @firstchar + @end

        END

    RETURN(@returnval)

END

Now that I have my helper function, I can write the sorting function. This function, udf_SortString, accepts a string of characters and returns them in sorted order. If there are numbers, they will be returned first, followed by any alpha characters. If the string has one or more words, the words will be sorted individually.

There are two loops in this function that do the sorting work. The first loop's job is to loop until characters have been swapped. Once a set of characters have been swapped, it is time to start at the beginning and do another scan. This loop will end once no characters have been swapped, and the string is in sorted order.

The second loop examines each character in the string and compares it to the next character in the string. If the characters are in sorted order, it moves to the next character. If the characters are not sorted, it uses the function I created earlier and swaps them. Then the loop starts over, and the characters are examined again and again until no more swapping occurs. Once a pass is made and there is no swapping, the string is in sorted order.

CREATE FUNCTION udf_SortString

(

    @string VARCHAR(1000)

)

RETURNS VARCHAR(1000)

AS

BEGIN

    DECLARE @len TINYINT

    DECLARE @i TINYINT

    DECLARE @currentchar CHAR(1)

    DECLARE @swapped BIT

    DECLARE @begin BIT

    DECLARE @nextchar CHAR(1)

    SET @begin = 1

    SET @len = LEN(@string)

    SET @i = 1

    WHILE @begin = 1 OR @swapped = 1

    BEGIN

        SET @swapped = 0

        SET @i = 1

        SET @begin = 0

        WHILE @i <= @len

        BEGIN

            SET @currentchar = SUBSTRING(@string, @i, 1)

            SET @nextchar = SUBSTRING(@string, @i + 1, 1)

            IF @currentchar > @nextchar AND (@nextchar > '')

            BEGIN

                SET @string = dbo.udf_swap(@string, @i, @i + 1)

                SET @swapped = 1

            END

            SET @i = @i + 1

        END

    END

    RETURN(@string)

END

GO

All I need to do is call my function to sort a string:

SELECT dbo.udf_SortString('asdofimasdfasdf23rasdf')

Sort it out

Most database users don't think about how much work goes into sorting data and searching for data because the database provides the functionality. By understanding how this works, it gives me a greater appreciation of the functionality.

While you may not apply the information you learned in this example for most users, I wanted to show how you can write your own string sorting functions in TSQL if the need arises. Also, I wanted to provide an overviewhope you learned about sorting algorithms if you weren't already aware of them.

Tim Chapman a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.

-----------------------------------------------------------------------------------------

Get database tips in your inbox

TechRepublic's free Database Management newsletter, delivered each Tuesday, contains hands-on SQL Server and Oracle tips and resources. Automatically subscribe today!

About

Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.

21 comments
jaja_cherry
jaja_cherry

I have data like below 01062009 01052009 27042009 I want result like this 27042009 01052009 01062009 How can i do like this?

arturo_rojas
arturo_rojas

Excellent. I would like if you can send me more information and examples about Sorting. arturo_rojas@hotmail.com

adam.nachman
adam.nachman

Why not simply use CLR? T-SQL is not exactly the most efficient medium, and these operations will be CPU heavy.

tony.taylor
tony.taylor

>> SET @fullstring = LTRIM(RTRIM(@fullstring)) What makes the above different from the below? SET @fullstring = TRIM(@fullstring)

aikimark
aikimark

@Tim You can get O(N*log(N)), or better, performance if you let the SQL Server database engine do the sorting for you. The process only requires T-SQL iteration on the front end and back end of the ordering process. Doing this removes the limit to "small sets". Steps: 1. Loop through the characters, Inserting them into a temp table. 2. Select Letter From TempTable Order By Letter 3. Iterate through the (2) result set, concatenating the results. Notes: * You might be able to use COALESCE() to perform the concatenation while the ordered letters are being retrieved from the temp table. * If the T-SQL environment doesn't perform concatenation efficiently, you might need to create a CLR-based function to place the characters into a fixed length string (or a StringBuilder object can be used), avoiding concatenation.

bogdincescu
bogdincescu

Nothing new. Only, as an old Oracle developer, I find pathetic T-SQL. At least if it were like VB...

bfartash
bfartash

Using functions is not good in Select Queries , its better to sort field like this in an auxilary field and fill them in triggers.(Behzad fartash bfartash@gmail.com)

tibor.szederkenyi
tibor.szederkenyi

Hi Tim, Could you please tell me what's the advantage of this development compared to SQL Server built-in sort orders and collations? Thnx, Tibor

chapman.tim
chapman.tim

There is no TRIM function in SQL server, just LTRIM & RTRIM....so, you have to combine them to get TRIM().

aikimark
aikimark

Please ignore my COALESCE() comment. It doesn't apply to this problem since there can be no Null values.

chapman.tim
chapman.tim

Sure, I agree. However, the post wasn't about the most effecient way to do it...just to show how a common sorting algorithm works (Bubble Sort). There are for sure faster ways to do it.

MikeSQLDBA
MikeSQLDBA

I don't usually post code on this forum. Sorry the formatting was lost. It shows up correctly where my post is viewed by itself. If you Show All Posts, it doesn't format correctly. It was harder to post this code that is was to write it!

MikeSQLDBA
MikeSQLDBA

If you create a Numbers table, this is easy to do using T-SQL. A Numbers table is very useful for any type of counting or iteration task. One way to create a Numbers table with 65,536 numbers is like this: [pre] CREATE TABLE #N ( Number int NOT NULL , CONSTRAINT PK_N_TEMP PRIMARY KEY CLUSTERED (Number) WITH FILLFACTOR=100 ) INSERT #N (Number) SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14 UNION SELECT 15 INSERT #N SELECT (a.Number * 16) + b.Number AS Number FROM #N a , #N b WHERE (a.Number * 16) + b.Number > 15 CREATE TABLE Numbers ( Number INT NOT NULL , CONSTRAINT PK_N PRIMARY KEY CLUSTERED (Number) WITH FILLFACTOR=100 ) INSERT INTO Numbers SELECT (a.Number * 256) + b.Number AS Number FROM #N a , #N b DROP TABLE #N [/pre] --------------------------------------- Next, take a look at these two functions. The first returns the sorted string as a varchar(1000), while the second function returns the string a table of single characters (which you can sort with the ORDER BY clause). [pre] GO DROP FUNCTION dbo.fSortString GO CREATE FUNCTION dbo.fSortString ( @string VARCHAR(1000) ) RETURNS VARCHAR(1000) AS BEGIN DECLARE @sortedString VARCHAR(1000) SET @sortedString = '' SELECT @sortedString = @sortedString + C.Chr FROM (SELECT TOP 100 PERCENT N.Number As ChrPos , Substring(@string, N.Number, 1) AS Chr FROM Numbers N WHERE N.Number > 0 AND N.Number 0 AND N.Number

chapman.tim
chapman.tim

No advantage at all...always use SQL's ordering when sorting fields, etc. This was just to show how you could write something to sort a character or numeric string of characters (in the same field)

aikimark
aikimark

@Tim I've found a few decent articles about the fastest string concatenation methods within the SQL Server environment. They make for an interesting read and I've learned something about XML_Path in the process.

aikimark
aikimark

What if we reduce this problem to a counting sort? Assume we output the characters to a temp table, as I've already suggested. Then we use a Group By query to reduce the result set to one row per unique character. The instance count of that character could then be duplicated during the concatenation using the REPLICATE() function.

aikimark
aikimark

@Tim Here's an idea for your next article...When you are left with no option but to code. Given that the database engine is faster than T-SQL, which is faster than CLR functions, etc., what are situations when you have no choice but to do some coding. Conversely, there are probably some identifiable instances where T-SQL was used and the database engine could have greatly simplified the result in quicker results as well.

chapman.tim
chapman.tim

Yeah, I am chapmandew from EE...it is a small world. Great question...For the substring function, no, I don't' believe it can be used in the same way as VB...its funny that you ask that, because I actually tried that for this article. :)

aikimark
aikimark

;-) I think I've seen two different XML-based implementations to concatenate row content. Maybe you can answer another T-SQL question related to string concatenation: Can SUBSTRING() be used on the left side of an equals sign like the VB/VBA MID() function can? If so, that might prove an even faster alternative to concatenation. One could define a string of spaces long enough to contain the row contents and delimiters. The row data, and delimiters, could be placed into the string at the correct location without using any concatenation operations. ==================== Unrelated question...are you the same Tim Chapman with whom I've shared tech discussions on Experts-Exchange? If so, "Hi. Small world."

chapman.tim
chapman.tim

you mean something like this.... select name + ',' from table for xml path('') :)