Check under the MSIL hood to see how the CLR is running

Microsoft Intermediate Language isn't just interesting in the abstract. You're likely to need a solid understanding of MSIL at some point during your .NET experience. Get a crash course here.

As the lowest common denominator of .NET, Microsoft Intermediate Language (MSIL, or IL) is important to the average developer, beyond the obvious curiosity factor. Checking out the IL for an application can give you some insight into how the Common Language Runtime (CLR) is executing your higher-level C# or VB.NET code, possibly helping you find and solve subtle problems.

In this article, I'll take you on a whirlwind tour of IL, touching on some of its key instructions, while explaining a little about how the CLR operates along the way. My intention is not to teach you how to program using IL, just to introduce you to some of its grammar and statements so you can make sense of it.

Introducing ILDASM
Microsoft's IL disassembly utility, Ildasm.exe (usually located in \Program Files\Microsoft.Net\FrameworkSDK\Bin), deconstructs .NET assemblies and extracts IL code from them for your viewing pleasure. When invoked on an assembly, the ILDASM initially gives you a view of all the classes and namespaces appearing in that assembly, as shown in Figure A.

Figure A
ILDASM browsing an assembly

When you drill down onto a class member or method, ILDASM will show you the IL code for that member. If you've ever seen assembler or J++ byte code before, IL might look somewhat familiar. If, on the other hand, you've been safely confined to the high tower of abstract programming languages, IL will look like complete gibberish. I think it resembles Klingon, but that's just my opinion.

Okay, you now know how to get a peek at the IL code for an assembly, but what does it all mean? Before answering that, we need to take a quick look at the CLR.

Virtual CPU
The .NET CLR functions as a virtual CPU for a .NET program executing IL code and performing operations on data. The CLR and a real CPU are similar in that they don't operate on variables directly in memory, instead using temporary copies of program variables, which the CLR places on a stack. The act of copying a variable from memory to the stack is referred to as loading, while the act of copying a variable from the stack back to memory is referred to as storing.

So the process of adding two numbers would look something like this:
  1. ·        Load the first number and push it onto the stack.
  2. ·        Load the second number and push it onto the stack.
  3. ·        Pop two items off the stack and add them.
  4. ·        Store the result to memory.

What's a stack?
The key to understanding IL lies in knowing how a stack works. A stack is an abstract data structure that operates on a last-in, first-out basis. When you push new items into a stack, any items that were already in the stack get pushed farther down into the stack. Similarly, removing an item from the stack causes any other items in the stack to move upward toward the beginning of the stack. Only the topmost item in the stack can be pulled out of it, and items come out of the stack in the same order that they were pushed into it. Think "Pez dispenser," and you've got the picture.

Important IL statements
Now that you've seen the basics of how the CLR operates, we can talk about some of that code you have in front of you. Don't have any in front of you? Then feel free to follow along with the IL code in this sidebar.

The first thing you'll see is the IL declaration for the current method, which will include its name, return type, and argument list, along with any decorations you'd normally attach to a method (static/shared, public, virtual, and so on). Object constructors are given a special .ctor name.

Method arguments are referred to in IL by their position in the argument list. If the method is a static or shared method, argument 0 is the first argument in the list. For instance methods, argument 0 is a pointer to the instance of the method's class (Me or this). Any local variables in the method are declared in a similar fashion in a section labeled .locals.

After any local variable declarations, the actual body of the program usually begins. Each IL instruction, or opcode, if you prefer, has a line label beginning with IL_. We'll look at some of the more important IL instructions in turn.

Working with variables
Instructions beginning with LD operate on variables by loading them from memory onto the stack. There are several load instructions, each one designed to operate on a particular kind of variable. Here are some of the variations:
  • ·        LDC loads a numeric constant onto the stack. This instruction receives two modifiers. The first is a type identifier, and the second is the actual numeric value.
  • ·        LDLOC loads a local variable onto the stack. There is also a LDLOCA instruction that loads the address of a local variable instead of the contents of the variable. Variables are identified by their position in the .locals section. These instructions use a different syntax for loading positions 4 and beyond, but the index number still appears in the instruction.
  • ·        LDARG loads one of a member's arguments, while the LDARGA instruction loads the address of an argument. Variables are identified by their position in the .locals section. These instructions use a different syntax for loading positions 4 and beyond, but the index number still appears in the instruction.
  • ·        LDELEM loads the element of an array onto the stack and is usually preceded by another load statement that indicates the index.
  • ·        LDLEN loads the length of an array onto the stack.
  • ·        LDFLD and LDSFLD load class fields (member variables) and static class fields onto the stack. Fields are identified by a fully qualified name.

Each load instruction has a corresponding store instruction, beginning with ST, that stores an item from the stack back into memory. For example, STLOC stores the topmost item in the stack to a local variable. The syntax rules for specifying a variable in a store instruction are usually similar to those of their load counterparts.

Comparison operations
You wouldn't be able to solve many problems with a programming language if you couldn't compare two values and make a decision based on the result. IL has a set of comparison operators, beginning with C, that compare values from the stack. Generally, if the comparison is true, a 1 is pushed onto the stack; if not, a 0 is pushed instead.

Most of these instructions are easily identifiable by their names. For example, CEQ compares two values for equality, while CGT determines whether the topmost value in the stack is greater than the second topmost. CLT works similarly to CGT but performs a less-than comparison instead.

And you thought Goto was dead
Usually, after comparing two values, you'll want to perform some action based on the result. The IL branch instructions (beginning with BR) jump to another instruction based on the contents of the topmost item in the stack. BRTRUE and BRFALSE pop the topmost item off the stack and perform a jump to the indicated line if the item is true (1) or false (0), respectively. Execution continues with the next instruction if the jump is not performed. There's also an unconditional branch operator, BR, which always jumps to the indicated line.

You'll find branches performing as if constructs in the original source code, as well as performing explicit Goto operations. Branch commands also make up the IL equivalents of the other higher-level flow control constructs: if, case, while, for, etc.

Creating new objects and calling other code
The CALL and CALLVIRT instructions invoke other methods and functions. CALL usually indicates that the method being called is static or shared, while CALLVIRT is used for instance methods. In both cases, the qualified name of the method is included in the instruction. Any arguments sent to a method are popped off the stack and must be loaded there before the method is called.

Since creating a new object requires a call to a constructor, object creation looks similar to any other method call. Arguments are loaded onto the stack, and then the NEWOBJ instruction is executed, which calls the object's constructor and places a reference to the object back on the stack. The qualified name of the object appears in the instruction.

That's going to about do it for your whirlwind IL tour. I hope that in addition to satisfying your inner propeller-head, you've picked up enough information to make some sense of an IL listing when presented with one.


Editor's Picks