MSIL Programming: Part 1

In this article you will learn that .NET assemblies contain an ultimate CIL code that is compiled to platform-specific instructions using JIT.

Abstract

The source code written and that executes under the .NET Common Language Runtime (CLR) is referred to as “managed code”. The managed compiler translates the associated *.cs, *.vb code files into low-level .NET CIL code, assembly manifest and type metadata eventually. Hence, MSIL is one of the programming languages supported by the .NET Framework where we can create, build and compile .NET applications by standalone CIL code too. Moreover, the MSIL code is the backbone of every .NET assembly, the more you dig deeper into the CIL instruction sets, the better you understand advanced .NET application development. In this article, you will understand the comprehensive treatment of MSIL instruction sets and semantics by authoring a simple program using CIL opcodes and role of the CIL compiler ilasm.exe to build and execute that .NET assembly code without employing the typical Visual Studio IDE build process.

Essentials

Programming with CIL instruction sets is rather complicated and considered to be one of the challenging tasks because here, the developer encounters the CLR built-in grammar directly, that is called “opcodes”, instead of user friendly C#, F#, or VB.Net English language syntax. Hence, it is advisable install the following tools in the researcher machine in this voyage.

  • .NET Framework 4.0 or later
  • Visual Studio 2010 IDE or later
  • ILDASM.exe, ILASM.exe utility
  • Notepad++
  • Sharpdeveloper (optional)
  • Xamarin Studio (optional)

Although CIL code can be authored via the simple Notepad editor, it is recommended to write CIL code using full-fledged editors like sharp-developers.

MSIL Internals

A .NET assembly contains CIL code, that is conceptually similar to Java bytecode in that it is not compiled to platform-specific instructions until absolutely necessary. The .NET CLR leverages a JIT compiler for each CPU targeting the runtime, each optimized for the underlying platform. The .NET binaries contain metadata that describes the characteristics of every type within the binary. The metadata is officially termed a manifest that contains information about the current version of the assembly and lists of all the externally referenced assemblies and culture information.
The previous figure is apparently, demonstrating that each of the .Net authorized programming source codes are eventually compiled into CIL rather than directly to a specific instruction set. Such potential makes all the .NET supported languages capable of interacting with each other. Furthermore, the CIL code provides the same benefits Java professionals have grown accustomed to.

dot NET Compilation Life Cycle
                                             Figure 1-1. The .NET Compilation Life-Cycle

Each .NET supported programming language maps their respective keywords to CIL mnemonics. Intermediate Language (IL) code tends to be cryptic and completely incomprehensible, for instance when loading a string variable into memory, we don't employ a user-friendly opcode name StringLoading, but rather ldstr. Assume we have constructed the following sample program in the C# language to understand the corresponding generated code behind CIL grammars.

The previous C# code is performing a simple addition of two numeric values via the testCalculation method. The .NET binaries do not contain platform-specific instructions but rather use an agnostic IL code that is generated using the corresponding C# compiler (csc.exe) during the build process.

                                                             Listing 1: Simple C# console application

  1. class Program  
  2. {  
  3.         static void Main(string[] args)  
  4.         {  
  5.             //Method Calling  
  6.             testCalculation(20,40);  
  7.             Console.ReadKey();   
  8.         }  
  9.           
  10.         // Demo static Method  
  11.         static void testCalculation(int iPar1, int iPar2)  
  12.         {  
  13.             int Result;  
  14.             Result = iPar1 + iPar2;  
  15.             Console.WriteLine("Calculation Output :: {0}",Result);    
  16.         }  
  17. }  
Once you compile this code, the CLR locates and loads that .NET binary into memory and you end up with a single *.exe assembly that contains a manifest, metadata and CIL instructions eventually. Fortunately, the .NET framework ships with an excellent utility to disassemble any .NET binary into its corresponding IL code, referred to as “ILDASM.EXE”.

We could employ the ildasm.exe utility to disassemble the IL code, either using a command prompt mode or a typical GUI representation. If you were to open this assembly using ILDASM.EXE in GUI mode, you will encounter the real back-end representation of each C# code statement in the corresponding CIL opcodes instruction set as in the following.

CIL Type exe assembly
                                 Figure 1-2. The CIL Type.exe assembly

The ILDASM.EXE loads up any .NET assembly and investigates its contents, including CIL code, manifest and metadata. The ILDASM.EXE is typically capable of dumping all of the metadata from .NET binaries in a CIL opcode representation. Let's double-click the testCalculation method to examine its underlying generated CIL code as in the following:

CIL code
                                                                        Figure 1-3. CIL code

Furthermore, if you would like to explore the type metadata for the currently loaded assembly, then press Ctrl + M that shows the metadata about the testCalculation method as in the following:

Metadata
                                                                     Figure 1-4. Metadata

IL Opcode Grammar

CIL is a full-fledged, Object Oriented Programming language like C# and encompasses the constituents of typical OOP features like inheritance, classes, control statements, interfaces and much more. As we claimed earlier, we can author .NET application directly in MSIL indeed without using the Visual Studio IDE. But a question commonly arises, why is CIL programming so important to understand, because it aids developers to write and maintain code better and to debug it. The following table illustrates a brief description of the typical Common Intermediate Language (CIL) instruction set.

                                                               Table: IL opcode meanings
IL opcode meanings

In the same way, the following table illustrates how typical C# keywords (data types) map to corresponding CIL keywords. As you can see these CIL keywords are usually referenced in CIL programming.

                                    Table: CIL Data Types Mapping
CIL Data Types Mapping

Creating First IL Program

So, are you ready to take up the challenge? Authoring pure IL code is deemed to be a cumbersome task, unlike C# code. We can develop either kind of application, for instance, a console, Windows or web based application but the foremost hindrance we usually encounter when coding, is not having the IntelliSense support. IL coding could be done using any normal editor like Notepad; this is the real beauty of IL coding. We will write a simple “Hello World!” program using Notepad and later compile that code using the ILASM.EXE utility. Hence, open Notepad, save it with *.il extension (such as “Helloworld.il”) and in it use the following code that displays a simple “Hello World!” string over the console. 

                                             Listing 2: First “Hello World” program coding in IL
  1. .assembly extern mscorlib  
  2. {  
  3.   .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )          
  4.   .ver 4:0:0:0  
  5. }  
  6. .assembly cilHelloWorld  
  7. {  
  8.   .hash algorithm 0x00008004  
  9.   .ver 1:0:0:0  
  10. }  
  11. .module cilHelloWorld.exe  
  12.   
  13. .imagebase           0x00400000  
  14.   
  15. .file alignment      0x00000200  
  16. .stackreserve        0x00100000  
  17. .subsystem           0x0003         
  18. .corflags            0x00020003     
  19.   
  20.   
  21. // =============== CLASS MEMBERS DECLARATION ===================//  
  22.   
  23. .class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object  
  24. {  
  25.   .method private hidebysig static void  Main(string[] args) cil managed  
  26.   {  
  27.     .entrypoint  
  28.       
  29.     .maxstack  8  
  30.   
  31.     IL_0000:  nop  
  32.     IL_0001:  ldstr      "First CIL program, Hello World!"  
  33.     IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)  
  34.     IL_000b:  nop  
  35.     IL_000c:  call       string [mscorlib]System.Console::ReadLine()  
  36.     IL_0011:  pop  
  37.     IL_0012:  ret  
  38.   }   
  39.   //=================Constructor================================//  
  40.   .method public hidebysig specialname rtspecialname instance void  .ctor() cil managed  
  41.   {  
  42.      
  43.     .maxstack  8  
  44.     IL_0000:  ldarg.0  
  45.     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()  
  46.     IL_0006:  ret  
  47.   } // end of constructor  
  48. }   
  49. // ======================End of Class================================//  

As just explained, we have specified a .NET class, method, namespace and types in terms of CIL using diverse attributes and directives to do the simple “Hello world!” feat. The important thing to remember about CIL directives is that they are never crafted with a dot prefix, such as its C# counterpart.

Finally, save that code file and open a Visual Studio command prompt to manipulate it with ILASM.EXE that compiles and debugs the HelloWorld.il file and produce a corresponding executable file as in the following:

                     Output: HelloWorld.il compilation process using ILASM.EXE
compilation process through ILASM EXE

After finishing the IL coding or any kind of subtle code modification, it is recommended to verify the compiled .NET binary image using the PEVERIFY.EXE command line utility that examines all the labels within the specified assembly for valid CIL directives as in the following:


                                    Output: CompileHelloWorld.exe verification
CompileHelloWorld.exe verification

Finally, it is time to test the generated .NET assembly (executable) file to determine whether or not it is producing the desired output. Hence, run the executable directly at the command prompt and observe the output as in the following:

                                    Output: CompileHelloWorld.exe execution
CompileHelloWorld.exe execution

Programmers usually need not be deeply concerned with the binary opcodes unless they build some extremely low-level .NET software. Instead, CIL coding attracts especially those reverse engineers that are patching buggy software as well as that detects subtle vulnerabilities by disassembling executables. Sometimes, code glitches are inadvertently left by a developer when they write the source code that can be exploited later by malicious hackers. Reverse engineers typically tends to utilize CIL code to add or remove features in existing software when the source code is not available.

Code Analysis

The HelloWorld.il file commences by declaring the .assembly extern token for referencing the mscorlib.dll file. The .publickeytoken attribute specifies the public key token value of the mscorlib.dll file and the .ver attribute determines the version of .NET platform you have installed on your development computer.

  1. .assembly extern mscorlib  
  2. {  
  3.    .publickeytoken = (B7 7A 5C 56 19 34 E0 89)   
  4.    .ver 4:0:0:0  
  5. }  
The next section defines the assembly namespace name as “cilHelloWorld”, followed by its version number 1.0.0.0 and hashing algorithm attributes.
  1. .assembly cilHelloWorld  
  2. {  
  3.    .hash algorithm 0x00008004  
  4.    .ver 1:0:0:0  
  5. }  
Then the .module directive determines the type of final producing assembly as such executable or DLL file.
  1. .module cilHelloWorld.exe  
Thereafter, the imagebase directive to 0x00400000 establishes the base address where the binary is loaded.
  1. .imagebase 0x00400000  
The .file directive adds some definition to the manifest of the assembly that is useful for documentation as in the following:
  1. .file alignment 0x00000200  
The .stackreserve directive configures the default stack size as 0x00100000.
  1. .stackreserve 0x00100000  
The .subsystem indicates whether the application is a console based or GUI based program. Here 3 specifies console based and 2 specifies GUI based as in the following:
  1. .subsystem 0x0003   
The .corflags establishes the default run time header information in the CLI as in the following:
  1. .corflags 0x00020003   
After defining all the essential directives, such as .module, corflags, imagebase and so on, we shall outline the definition for the class Program type that extends from the System.Object base type. Here, the beforefieldinit stipulates, the type should be initialized before a static field value as in the following:
  1. .class private auto ansi beforefieldinit cilHelloWorld.Program extends [mscorlib]System.Object  
Although we shall discuss in detail all the .NET type definitions in terms of IL coding in forthcoming papers, here it is essential to specify the definition of a default class constructor in the IL file as in the following:
  1. .method public hidebysig specialname rtspecialname instance void .ctor() cil managed   
The program class contains the definition for the application entry point method Void Main. Here, the hidebysig conceals the base class interface of this method as in the following:
  1. .method private hidebysig static void Main(string[] args) cil managed  
The method, that is the entry point is of a program, will always contain the following directives:
  1. .entrypoint  
The .maxstack directive sets a default value of 8 that specifies the maximum number of variables pushed onto the stack during execution.
  1. .maxstack 8  
Now, the real implementation starts in the Main() method body, by portraying various tokens. These tokens are called code labels (IL_0001, IL006). In fact, these code labels are completely optional and we can remove them as in the following:

called code labels

In the previous code, the execution starts with a nop that specifies that no operation is to be done yet. Then the ldstr instruction loads a string with a value “First CIL program, Hello World!” into a memory stack. Finally, the call instruction calls the Console.WriteLine method to print that specific string. The essential code culminates using a nop opcode again. At the end of execution, the pop instruction removes the current value from the top of the stack and places it into a local variable and the program terminates using a ret instruction.

Synopsis

As we have seen, the .NET assemblies contain an ultimate CIL code that is compiled to platform-specific instructions using JIT. In addition, we have explored assembly metadata and manifest contents by examining the CIL opcode using the ILDASM.EXE utility as well as, the description of typically used keywords for CIL coding. On behalf of the essential IL keywords or labels, we have drafted our first “Hello world!” program in genuine IL programming code and came to understand how to compile IL code and verify it.

References

[1] ECMA-335 manual
[2] MS-Press book visual C# 2008 : The language