Collation describes the code page, case sensitivity, accent
sensitivity, and language or alphabet in use. I often see collation handled on
a per-database scale. For instance, in my experience, most DBAs
in North America go with the default collation, which is case insensitive. The
collation has no effect on the data itself, but rather how it is compared and
This tip illustrates how you can force the use of any
specific collation without regard to the collation specified when the database
of interest is created. It also shows some of the complications involved when
your database attempts to deal with multiple character sets.
To try out the experiments in this column, create a test
database and populate it with a few rows, as in
Listing A. (Note: Regular readers of my tips will notice that I
create a schema in which to house the tables related to any given tip. This is
because I always use a single database called SQLTips
for this purpose. Using schemas nicely groups the tables relevant to any
contains a few test queries for you to run. In each case, the query returns all four rows. This is to be
expected when a case-insensitive collation is chosen. But suppose your task is
to locate one of these rows using an exact match?
This is where collation comes in. Although a DBA usually
specifies the collation when the database is created, you can apply a different
collation after the fact simply by adding a COLLATE predicate to the WHERE
clause. See Listing C. Each of the first three queries returns exactly one row,
while the fourth query returns no rows.
Suppose that you want to change all possible spellings of
‘fuller’ (including the nonsense ones such as ‘fUlLEr’)
to ‘Fuller’. You can do that quite easily with the code in Listing D.
I like to
run such a transaction when dealing with data inherited from a mainframe
database because I tend to think of upper-case words as yelling, and doing it
once to the data enables me to avoid worrying about it thereafter.
If you want
a list of all the collations available, along with terse explanations, run this
SELECT * FROM ::fn_helpcollations()
determine the default collation of the database of interest, do this:
In Listing E, I
add some rows containing special alphabetic characters. I use the name of my
good friend and colleague, DejanSunderic,
whose surname is properly written as Šunderic’ (the
trailing ‘c’ should actually have an acute accent above it — the display is a
result of the selected font).
As you see,
interesting things occur without the COLLATE predicate:
WHERE Name LIKE N'S%'
produces the following result set:
the special case of ‘Šunderic’ is not
included. Let’s experiment further:
SELECT * FROM CaseCheck.Names
ORDER BY Name
produces the following result set:
clearly incorrect. PK 5 should reside somewhere below all occurrences of
‘Fuller.’ I am no language expert, so I cannot claim to know where among the Ss this
ought to reside, but I suspect that this decision depends on local rules. It’s clear that ‘Šunderic’ ought to reside somewhere
after all occurrences of ‘Fuller’ and not before — but where in the list of
names beginning with ‘S’?
I did another experiment using the name of my friend
Alexander Karmanov. To see his name in the Cyrillic alphabet, check out Figure A. I ran the statement in
Figure B using the Cyrillic
alphabet spelling of his name.
default character set in SSMS, a query on the table returns his name as a
series of question marks. However, if I run the following statement, SSMS
returns the correct row but still displays his name as a series of question
marks due to the default character set. The important thing is that the Unicode
characters were entered and accepted. See Figure C.
If you want to determine the native collation in a given database, use
WHERE Name = 'SQLTips'
Miss a tip?
Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller’s column.
Subscribe to the Data Insider Newsletter
Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays