SMBs

Visual Studio compiler options, part 3: Option Compare

Option Compare is a compiler option in VB.NET with the right default for all new projects. Here's what SMBs need to know about the option.

This is the third installment in my series on setting compiler options for Visual Studio VB.NET projects, versions 2005-2010. Read the previous installments: Setting Visual Studio Compiler options, part 1 and Visual Studio compiler options, Part 2: Option Strict.

Compiler options are project-level settings that determine how the compiler behaves when it compiles your code. You view and set compiler options in the Compile tab of the project properties sheet (Figure A). In this post, I will discuss the Option Compare switch. Figure A

What it means

Option Compare is a two-option switch that determines how the CLR compares string data in comparison operations that use the =, <>, <, >, and Like operators. The two options are Binary (the way C# does it) and Text. In all versions of VB.NET, the default is Binary. So if you install Visual Studio and don't change anything, your string comparisons are going to act just like they do in C# in all cases. (The C# compiler doesn't give you a choice.)

Contrary to what the name implies, this does not affect most string comparison methods available to you in VB.NET. There are dozens of string comparison methods other than =, <>, <, >, and Like, and none of them are affected by Option Compare. This includes a wide variety of static (shared) and instance methods on classes that implement IEnumerable and IComparable, such as the String, Enumerable, and Array classes. Here's a mostly complete list of methods that ignore Option Compare.

String Equals()

Compare()

Enumerable Any()

All()

Contains()

Distinct()

ElementAt()

Except()

First()

FirstOrDefault()

GroupBy()

GroupJoin()

Last()

LastOrDefault()

Max()

Min()

OrderBy()

SequenceEqual()

TakeWhile()

ThenBy()

Union()

Where()

Array Exists()

Find()

FindAll()

FindIndex()

FindLast()

FindLastIndex()

IndexOf()

LastIndexOf()

Reverse()

Sort()

TrueForAll()

IComparable CompareTo()

Contains()

EndsWith()

Equals()

IndexOf()

IndexOfAny()

LastIndexOf()

LastIndexOfAny()

Replace()

Split()

StartsWith()

Substring()

This means that, by default, all comparison methods you can invoke on a Generic List, Array, or Collection will use Binary compare whether you intend it or not, as are methods in the above list that you invoke on Strings. You can make exceptions by setting Option Compare for modules, classes, and structures.

Let's get into the details.

Binary

A Binary compare looks at the binary representation of the string data, as opposed to any kind of alphanumeric representation, to determine whether two strings are equivalent. For example, the binary representation of the uppercase letter A is 01000001. (Unicode and ASCII are different encoding schemes, but this is not the best place to get into the differences, so I'm using the simple case in which ASCII and UTF-8 are essentially the same.) A Binary compare looks at this binary representation, not the human-readable form of what it represents.

The importance of this becomes evident as soon as you care about code page precision -- that is, case sensitivity, mixed character sets, or any scenario in which every possible character must be treated as distinct from every other. The UTF-8 binary representation of the lowercase letter a is 01100001 -- clearly not the same thing as the uppercase a, which is 01000001. Using Binary compare, there is no way a = comparison of A and a is going to return True.

The same is true for accented characters. Consider the various possibilities for the humble UTF-8-encoded lowercase a once you add the accented variants.

a Binary Value Accent
à 11000011 10100000 Grave
á 11000011 10100001 Acute
â 11000011 10100010 Circumflex
ã 11000011 10100011 Tilde
ä 11000011 10100100 diaeresis
å 11000011 10100101 ring above
æ 11000011 10100110 (not an accent, just the funky ae)

By default, VB.NET is going to consider A <> a, and a <> â.

But string comparison is not just about equivalence; it is also about precedence (sort order). A, a, and â are not just not equal; they have a specific place in line, one before the other. That precedence is determined by the code page. Under the English code page ANSI 1252 (which is used by English and most European languages), that yields an ascending sort order like this:

a < A < â

(The MSDN documentation says that the sort order would be A < a < â, but that is not correct.)

When sorted in ascending order on Binary compare using code page 1252, an uppercase A will always come after a lowercase a.

The code pages determine precedence among every single character, so unaccented and accented characters have relative precedence. Contrary to the MSDN documentation, an uppercase Z will not necessarily appear before an uppercase À. Here's what you get with ANSI 1252:

a < A < À < â < Z

That is the upshot of Binary compare -- every possible character representation is unique and has a defined sort order.

Text

Text compare affects comparisons involving the =, <>, <, >, and Like operators. It essentially makes exceptions based on the text representation of the characters. The most important exception is that it treats case-different characters as the same for purposes of equivalence; sort order is unaffected. Thus, under Text compare:

A = a

 = â

A <> Â

But:

A = a < À < â < Z

The sort order is determined by the code page, but in a Text compare, sort order will not be enforced if you are using a < or > operator; it will be enforced only if you use IEnumerable.Sort(), OrderBy(), or ThenBy().

Read page two to learn why it's important.

0 comments

Editor's Picks