SMBs

Visual Studio compiler options, part 3: Option Compare

Option Compare is a compiler option in VB.NET with the right default for all new projects. Here's what SMBs need to know about the option.

This is the third installment in my series on setting compiler options for Visual Studio VB.NET projects, versions 2005-2010. Read the previous installments: Setting Visual Studio Compiler options, part 1 and Visual Studio compiler options, Part 2: Option Strict.

Compiler options are project-level settings that determine how the compiler behaves when it compiles your code. You view and set compiler options in the Compile tab of the project properties sheet (Figure A). In this post, I will discuss the Option Compare switch. Figure A

What it means

Option Compare is a two-option switch that determines how the CLR compares string data in comparison operations that use the =, <>, <, >, and Like operators. The two options are Binary (the way C# does it) and Text. In all versions of VB.NET, the default is Binary. So if you install Visual Studio and don't change anything, your string comparisons are going to act just like they do in C# in all cases. (The C# compiler doesn't give you a choice.)

Contrary to what the name implies, this does not affect most string comparison methods available to you in VB.NET. There are dozens of string comparison methods other than =, <>, <, >, and Like, and none of them are affected by Option Compare. This includes a wide variety of static (shared) and instance methods on classes that implement IEnumerable and IComparable, such as the String, Enumerable, and Array classes. Here's a mostly complete list of methods that ignore Option Compare.

String Equals()

Compare()

Enumerable Any()

All()

Contains()

Distinct()

ElementAt()

Except()

First()

FirstOrDefault()

GroupBy()

GroupJoin()

Last()

LastOrDefault()

Max()

Min()

OrderBy()

SequenceEqual()

TakeWhile()

ThenBy()

Union()

Where()

Array Exists()

Find()

FindAll()

FindIndex()

FindLast()

FindLastIndex()

IndexOf()

LastIndexOf()

Reverse()

Sort()

TrueForAll()

IComparable CompareTo()

Contains()

EndsWith()

Equals()

IndexOf()

IndexOfAny()

LastIndexOf()

LastIndexOfAny()

Replace()

Split()

StartsWith()

Substring()

This means that, by default, all comparison methods you can invoke on a Generic List, Array, or Collection will use Binary compare whether you intend it or not, as are methods in the above list that you invoke on Strings. You can make exceptions by setting Option Compare for modules, classes, and structures.

Let's get into the details.

Binary

A Binary compare looks at the binary representation of the string data, as opposed to any kind of alphanumeric representation, to determine whether two strings are equivalent. For example, the binary representation of the uppercase letter A is 01000001. (Unicode and ASCII are different encoding schemes, but this is not the best place to get into the differences, so I'm using the simple case in which ASCII and UTF-8 are essentially the same.) A Binary compare looks at this binary representation, not the human-readable form of what it represents.

The importance of this becomes evident as soon as you care about code page precision — that is, case sensitivity, mixed character sets, or any scenario in which every possible character must be treated as distinct from every other. The UTF-8 binary representation of the lowercase letter a is 01100001 — clearly not the same thing as the uppercase a, which is 01000001. Using Binary compare, there is no way a = comparison of A and a is going to return True.

The same is true for accented characters. Consider the various possibilities for the humble UTF-8-encoded lowercase a once you add the accented variants.

a Binary Value Accent
à 11000011 10100000 Grave
á 11000011 10100001 Acute
â 11000011 10100010 Circumflex
ã 11000011 10100011 Tilde
ä 11000011 10100100 diaeresis
å 11000011 10100101 ring above
æ 11000011 10100110 (not an accent, just the funky ae)

By default, VB.NET is going to consider A <> a, and a <> â.

But string comparison is not just about equivalence; it is also about precedence (sort order). A, a, and â are not just not equal; they have a specific place in line, one before the other. That precedence is determined by the code page. Under the English code page ANSI 1252 (which is used by English and most European languages), that yields an ascending sort order like this:

a < A < â

(The MSDN documentation says that the sort order would be A < a < â, but that is not correct.)

When sorted in ascending order on Binary compare using code page 1252, an uppercase A will always come after a lowercase a.

The code pages determine precedence among every single character, so unaccented and accented characters have relative precedence. Contrary to the MSDN documentation, an uppercase Z will not necessarily appear before an uppercase À. Here's what you get with ANSI 1252:

a < A < À < â < Z

That is the upshot of Binary compare — every possible character representation is unique and has a defined sort order.

Text

Text compare affects comparisons involving the =, <>, <, >, and Like operators. It essentially makes exceptions based on the text representation of the characters. The most important exception is that it treats case-different characters as the same for purposes of equivalence; sort order is unaffected. Thus, under Text compare:

A = a

 = â

A <> Â

But:

A = a < À < â < Z

The sort order is determined by the code page, but in a Text compare, sort order will not be enforced if you are using a < or > operator; it will be enforced only if you use IEnumerable.Sort(), OrderBy(), or ThenBy().

Read page two to learn why it's important.

Why it's important

There are several reasons you should give some thought to this option. Whether one is more important than another depends on what your needs and priorities are.

Equivalence

Because a Binary compare is case-sensitive, you must understand that with this option, all tests for equivalence will be case-sensitive. If that's what you want, all is well. If not, you will have to think about string comparisons everywhere you do them, and make sure you are comparing what you think you are comparing. You will need to override the default at local scope every time you want a case-insensitive check for equivalence.

The classic example is any time you check to see if one string equals another:

If FruitBasket.FruitType = Fruit.TypeEnum.Apple.ToString Then <p>MakeApplePie()</p> End If

"Apple" and "apple" are not the same in Binary compareland. So if you are going to stick to the default, and if you are going to compare strings, and if in the business rules of the use case "Apple" and "apple" are equivalent, you will have to override the default.

One way to do this is to dispense with the = operator altogether and just use one of the IComparable methods with an explicit setting of the comparison method — something like this:

If FruitBasket.FruitType.Equals(Fruit.TypeEnum.Apple.ToString,   <p>StringComparison.OrdinalIgnoreCase) Then</p> <p>MakeApplePie()</p> End If

If all string comparisons in one particular module are going to abide by the rule that "Apple" and "apple" are equivalent, you can set Option Compare to Text for the whole module and not mess with your = operators. It must be the first line in the file:

Option Compare Text <p>Imports System.Text</p> Imports System.Collections.Generic
Public Class AppleComparer <p>If FruitBasket.FruitType = Fruit.TypeEnum.Apple.ToString Then</p> <p>MakeApplePie()</p> <p>End If</p> End Class

Note that if you have more than one class defined in a file, the Option Compare statement still must go at the top of the file, and it will apply to all classes defined in that file. You cannot do this:

Imports System.Text Imports System.Collections.Generic
Public Class Dessert <p>If FruitBasket.FruitType = Fruit.TypeEnum.Apple.ToString Then</p> <p>MakeApplePie()</p> <p>End If</p> End Class
Option Compare Text <p>Public Class Entree</p> <p>If Pantry.IsEmpty Then</p> <p>GoShopping()</p> <p>End If</p> End Class

If your module is not so simple that you can set Option Compare Text for the entire file, or if you need case-insensitive comparisons to rule the day throughout the module, you might be tempted to convert the strings to upper- or lowercase before comparing them, but don't do it.

If FruitBasket.FruitType.ToUpper = Fruit.TypeEnum.Apple.ToString.ToUpper Then <p>MakeApplePie()</p> End If

That approach has two downsides. One, it converts the strings to a different case, thus creating new strings, a performance hit. Two, it will not work in languages that have different characters for different cases. You should use a binary comparison method with a StringComparison.OrdinalIgnoreCase parameter instead.

Wherever case sensitivity is a barrier to correct results in straight-up string comparisons, you will have to do something to ensure case-insensitive comparisons. That either means setting Option Compare to Text in your project, setting it to Text in your modules, or forcing case-insensitive comparisons at local scope.

Performance

It's nice to have so many different ways to compare strings in VB.NET, but they do not perform the same. To show you what I mean, I took the words "Macintosh" and "macintosh" and compared them in batches. In each batch, I compared the strings 10,000 times in loops of 100 iterations, netting 1,000,000 back-to-back comparisons per batch. I ran each batch several times (approximately 10 each), so I could get a range of results. Here is what I got for the average elapsed number of milliseconds per batch of 10,000 comparisons.

Operator Option Compare Setting Case Sensitivity Low Avg. High Avg.
= Text N/A 0.03 0.04
= Text ToUpper/Lower, but N/A 0.04 0.09
> Text N/A 0.03 0.04
> Text ToUpper/Lower, but N/A 0.04 0.10
Like Text N/A 0.49 0.65
Like Text ToUpper/Lower, but N/A 0.41 0.54
= Binary Case sensitive 0.0 0.0
= Binary ToUpper/Lower case insensitive 0.03 0.07
> Binary Case sensitive 0.0 0.0
> Binary ToUpper/Lower case insensitive 0.03 0.06
Like Binary Case sensitive 0.0 0.0
Like Binary ToUpper/Lower case insensitive 0.05 0.09
Equals Binary StringComparison.Ordinal 0.0 0.0
Equals Binary StringComparison.OrdinalIgnoreCase 0.0 0.0
CompareTo Binary StringComparison.Ordinal 0.01 0.02
CompareTo Binary StringComparison.OrdinalIgnoreCase 0.0 0.0

The ranges for ToUpper and ToLower were the same, so I'm listing them together. Some of the operators are so fast they didn't even register elapsed times. Others are much slower, with Like being the worst of all (by far) when using Option Compare Text.

Here's the same set of results when sorted from fastest to slowest.

Operator Option Compare Setting Case Sensitivity Low Avg. High Avg.
= Binary Case sensitive 0.0 0.0
> Binary Case sensitive 0.0 0.0
CompareTo Binary StringComparison.OrdinalIgnoreCase 0.0 0.0
Equals Binary StringComparison.Ordinal 0.0 0.0
Equals Binary StringComparison.OrdinalIgnoreCase 0.0 0.0
Like Binary Case sensitive 0.0 0.0
CompareTo Binary StringComparison.Ordinal 0.01 0.02
= Text N/A 0.03 0.04
> Text N/A 0.03 0.04
> Binary ToUpper/Lower case insensitive 0.03 0.06
= Binary ToUpper/Lower case insensitive 0.03 0.07
= Text ToUpper/Lower, but N/A 0.04 0.09
> Text ToUpper/Lower, but N/A 0.04 0.10
Like Binary ToUpper/Lower case insensitive 0.05 0.09
Like Text ToUpper/Lower, but N/A 0.41 0.54
Like Text N/A 0.49 0.65

What to do with it

I don't see any benefit in using Option Compare Text. The only thing it gives you is a case-insensitive comparison method that is slower than most of your Binary alternatives, and much slower than some of them. The Like operator under Option Compare Text is a true slowpoke. The others are better, but not good enough to compete. You can leave Option Compare at its default of Binary without any downsides.

But even if you're an Option Compare Binary junkie, this exercise points out the downside to several of your comparison options. ToUpper and ToLower are much slower than the StringComparison alternatives. It may be that the difference is not significant in your applications; that would certainly be the case in a desktop application where string comparisons are few. But in a multi-user application or an application running on an underpowered device, you're better off going with the much faster alternatives. And there's really no benefit in habitually using a slow string comparison method.

Exceptions and variations

The only scenario I can see recommending that you stick with Option Compare Text is the legacy one. If your boss hands you an application that has Option Compare set to Text, either for the whole project or for a module, make sure you review the code before you change it to Binary. Once you do, the application will behave differently, and you'll want to make sure you've deployed appropriate alternatives at all such points before making the switch.

Conclusion

Option Compare is a compiler option in VB.NET with the right default for all new projects. Leave it alone. Then scour your code for string-comparison bottlenecks and implement the much-faster alternatives available to you.

Editor's Picks