.NET Binary Reverse Engineering: Part 1

Introduction

 
The prime objective of this article is to explain the .NET mother language called Common Instruction Language (CIL) that has laid the foundation of .NET. Here, you will understand the distinction between CIL directives, attributes, opcodes, and numerous CIL tools that provide a significant role in code execution. The trigger for writing this article is to provide deep analysis and examination of CIL grammar.
 
The source code of any software or executable application is an intellectual property of a vendor company and is not disclosed for proprietary reasons. Without the actual code, we need to rely on the native code so it is necessary to delve into the CIL before proceeding to code disassembly. Apart from that, we shall discuss some of the advanced conceptions related to Reverse Engineering such as Round-tripping engineering, Obfuscation, Code Disassembling by using some advanced tools such as IDAPro, Ollydbg, Hex Editor, Ilasm, and Reflector in the forthcoming articles of this series.
 
Abstract
 
The Microsoft Intermediate Language (MSIL) is an essential piece of the CLR and the code that is written and executed under the CLR is referred to as Managed Code. The managed compiler translates that code (*.cs file) into CIL code, a manifest, and metadata. This process typically undergoes two compilation phases. The first compilation phase is done by the compiler in which the source code is transformed into the MSIL. The second compilation phase occurs at run time when the MSIL code is compiled to native code. The .NET platform is considered language-independent because the process execution of a managed application is identical regardless of the source language. Finally, the CIL is a full- fledged .NET programming language, with its own syntax and compiler.
 
The beauty of MSIL code is that it is compiled once and executed anywhere using the JIT compiler that compiles assemblies into native binary code that targets a specific platform. You can write an application and deploy that application to Windows, Linux, Macs, and other platforms that support the .NET run time.
 
Prerequisite
 
To execute and examine MSIL/CIL code, you need to configure your machine with the following tools:
  • .NET Framework 3.5 or higher
  • Either SharpDeveloper Studio or Xamarin Studio
  • Visual Studio Command Prompt
  • IL Disassembler (ildasm.exe)
  • Reflector
Understanding CIL
 
When you build a .NET assembly using your managed language of choice (including C#, VB .Net, F#, Perl, and COBOL), the associated compiler translates your source code into Common Instruction Language. CIL is just another structural .NET programming language, it is possible to build .NET assemblies directly using the CIL and the CIL compiler (ILASM.EXE) that ships with the .NET framework.
 
The more you understand the grammar of CIL, the better able you are to move into the arena of advanced .NET programming. The programmer with a comprehensive knowledge of the CIL can do the following tasks:
  • Disassembling an existing assembly, edit the CIL code, and recompile the updated code.
  • The CIL is the only .NET language that allows you to access each aspect of CTS and CLS.
  • Building in-house dynamic assemblies using the System.Reflection.Emit namespace API.
CIL does not simply define a general set of keywords such as public, private, new, get, set, this. Rather, the token set understood by the CIL compiler is sub-divided into three categories. Each category of CIL token is expressed using a specific syntax. The three categories are as follows.
 
CIL Directive
 
Directives are represented syntactically using a single dot prefix (.class, .assembly). They are a set of CIL tokens that are used to describe the structure of a .NET assembly called CIL directives. They are used to inform the CIL compiler to define the namespace, class, and methods that will populate an assembly.
 
CIL Attributes
 
Sometimes CIL directives are not descriptive enough to fully express the definition of a given type, however, they can be further specified with various CIL attributes to qualify how a directive should be processed.
 
CIL Opcodes
 
The operation codesm or opcodes, provide the type implementation logic once a .NET assembly namespace and type has been defined in terms of CIL code.
 
Despite providing numerous advantages, CIL programming has some drawbacks such as unsafe code. CIL source code is inherently unsafe and could lead to disaster.
 
First CIL Program
 
We need a code editor to author our first CIL program, for instance, Notepad or Wordpad but it is good to write code another full-fledged open source .NET IDE such as SharpDevelop or Xamarin Studio because they are integrated with an existing .NET FCL and automatically directive recognition feature. No matter which IDE or editor we are using, the important point is to save that CIL code file with a *.il extension.
 
The following cope illustrates the first hello world program using the CIL programming language. However, open Notepad and place the following code and save this file as Test.il.
  1. .assembly extern mscorlib {}  
  2. .assembly FirstApp  
  3. {  
  4.   
  5. }  
  6.   
  7. .namespace FirstApp  
  8. {  
  9.     .class private auto ansi beforefieldinit Test   
  10.     {  
  11.         .method public hidebysig static void Main(string[] argd) cil managed  
  12.         {  
  13.              .entrypoint  
  14.              .maxstack    1  
  15.              ldstr        "Welcome to CIL programming world"                  
  16.              call         void [mscorlib] System.Console::WriteLine(string)  
  17.              ret  
  18.         }  
  19.     }  
File: Test.il
 
CIL code Compilation
 
After finishing the code, save this file as Test.il and compile it using the .NET Framework tool ILASM.exe shipped with the .NET Framework as in the following command:
 
ILASM /exe /debug Test.il
 
Here the exe option indicates that the target is a console-based application. The debug option asks the compiler to generate a debug file (test.pdb) for the application that is useful for viewing source code in a debugger or disassembler.
 
 
After successfully compiling the Test.il file, Test.exe is created in the project directory that is the final executable that yields our desired output as in the following;
 
 
When you build or modify assemblies using CIL code, it is always advisable to verify that the compiled binary image is a well-formed .NET image using the peverify.exe utility as in the following:
 
 
Here in the previous figure, it is proved that all opcodes within the test.exe binary are valid CIL codes. While the CIL compiler has numerous command-line options as follows:
 
 
In the previous CIL code source file Test.il, the first declaration is an external reference to the mscorlib library. The mscorlib.dll contains the core of the .NET Framework FCL that includes the System.Console class. The second assembly directive is simply the name of the assembly, FirstApp, and the third directive defines the namespace.
  1. .assembly extern mscorlib {}  
  2. .assembly FirstApp  
  3. {  
  4.   
  5. }  
  6. // class namespace  
  7. .namespace FirstApp  
  8. { ……} 
     
The next lines define a class and a method within the class. The class directive introduces a public class named Test that implicitly inherits the System.Object class. The method directive defines the public Main as a member method. The cil keyword indicates that the method contains an intermediate code.
  1. .class private auto ansi beforefieldinit Test   
  2.     {  
  3.         .method public hidebysig static void Main(string[] argd) cil managed  
  4.         { …}  
  5.     } 
The Main method commences with two directives. The .entrypoint directive designates Main as the entry point of the application. The .maxstack sets the size of the memory stack to 1 slot. The ldstr directive loads the string into memory. The call directive consumes one item from the memory and displays them using the WriteLine method. Finally, the ret directive indicates the return or exit from the method.
 
.entrypoint
.maxstack 1
ldstr "Welcome to CIL programming world"
call void [mscorlib] System.Console::WriteLine(string)
ret
 
CIL Code Post-mortem Analysis
 
CIL is much easier to understand and interpret compared to assembly language. The contents of source code in CIL programming is case sensitive like C# but the statements are not terminated with a semicolon. Apart from that, the most significant part of a CIL application is dotted prefixed directives and actual executable source code. There are several categories of directives proposed by the .NET CLR such as Assembly, Class, Method.
 
In order to understand the CIL code directive, we shall write a console application using the Xamarin Studio that adds two integer types. Although we can develop such an application using other code editors, Xamarin Studio provides more functionality and facilities in terms of writing crucial IL coding rather than other editors.
 
So first open the Xamarin Studio and select New Solution from the File menu. Then choose IL type Console Project from the project templates as in the following.
 
 
Thereafter, rename the main.il to MathFun.il and place the following code in the MathFun.il file. We shall discuss each segment of the *.il file in the next section.
  1. .assembly extern mscorlib  
  2. {  
  3.   .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )            
  4.   .ver 2:0:0:0  
  5. }  
  6. .assembly MathFun  
  7. {  
  8.   .ver 1:0:0:0  
  9.   .locale "en.US"  
  10. }  
  11. .module MathFun.exe  
  12.   
  13. .imagebase 0x00400000  
  14. .file alignment 0x00000200  
  15. .stackreserve 0x00100000  
  16. .subsystem 0x0003        
  17. .corflags 0x00000003      
  18.   
  19. // =============== CLASS MEMBERS DECLARATION ===================  
  20.   
  21. .class public auto ansi beforefieldinit MathFun  
  22.        extends [mscorlib]System.Object  
  23. {  
  24.   .field private string '<Name>k__BackingField'  
  25.   .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )   
  26.   .method public hidebysig specialname rtspecialname   
  27.           instance void  .ctor(string name) cil managed  
  28.   {  
  29.     // Code size       18 (0x12)  
  30.     .maxstack  8  
  31.     IL_0000:  ldarg.0  
  32.     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()  
  33.     IL_0006:  nop  
  34.     IL_0007:  nop  
  35.     IL_0008:  ldarg.0  
  36.     IL_0009:  ldarg.1  
  37.     IL_000a:  call       instance void MathFun::set_Name(string)  
  38.     IL_000f:  nop  
  39.     IL_0010:  nop  
  40.     IL_0011:  ret  
  41.   } // end of method Test::.ctor  
  42.   
  43.   .method public hidebysig specialname instance string get_Name() cil managed  
  44.   {  
  45.     .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )   
  46.     // Code size       11 (0xb)  
  47.     .maxstack  1  
  48.     .locals init (string V_0)  
  49.     IL_0000:  ldarg.0  
  50.     IL_0001:  ldfld      string MathFun::'<Name>k__BackingField'  
  51.     IL_0006:  stloc.0  
  52.     IL_0007:  br.s       IL_0009  
  53.   
  54.     IL_0009:  ldloc.0  
  55.     IL_000a:  ret  
  56.   } // end of method Test::get_Name  
  57.   
  58.   .method public hidebysig specialname instance void set_Name(string 'value') cil managed  
  59.   {  
  60.     .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )   
  61.     // Code size       8 (0x8)  
  62.     .maxstack  8  
  63.     IL_0000:  ldarg.0  
  64.     IL_0001:  ldarg.1  
  65.     IL_0002:  stfld      string MathFun::'<Name>k__BackingField'  
  66.     IL_0007:  ret  
  67.   } // end of method Test::set_Name  
  68.   
  69.   .method public hidebysig instance string Display() cil managed  
  70.   {  
  71.     // Code size       22 (0x16)  
  72.     .maxstack  2  
  73.     .locals init ([0] string CS$1$0000)  
  74.     IL_0000:  nop  
  75.     IL_0001:  ldstr      "Hello "  
  76.     IL_0006:  ldarg.0  
  77.     IL_0007:  call       instance string MathFun::get_Name()  
  78.     IL_000c:  call       string [mscorlib]System.String::Concat(string,string)  
  79.     IL_0011:  stloc.0  
  80.     IL_0012:  br.s       IL_0014  
  81.   
  82.     IL_0014:  ldloc.0  
  83.     IL_0015:  ret  
  84.   } // end of method Test::Display  
  85.   
  86.   .method public hidebysig instance int32 Addition(int32 x, int32 y) cil managed  
  87.   {  
  88.     // Code size       9 (0x9)  
  89.     .maxstack  2  
  90.     .locals init ([0] int32 CS$1$0000)  
  91.     IL_0000:  nop  
  92.     IL_0001:  ldarg.1  
  93.     IL_0002:  ldarg.2  
  94.     IL_0003:  add  
  95.     IL_0004:  stloc.0  
  96.     IL_0005:  br.s       IL_0007  
  97.   
  98.     IL_0007:  ldloc.0  
  99.     IL_0008:  ret  
  100.   } // end of method Test::Addition  
  101.   
  102.   .property instance string Name()  
  103.   {  
  104.     .get instance string MathFun::get_Name()  
  105.     .set instance void MathFun::set_Name(string)  
  106.   } // end of property Test::Name  
  107. // end of class MathOperation.Test  
  108.   
  109. .class private auto ansi beforefieldinit MathFun extends [mscorlib]System.Object  
  110. {  
  111.   .method private hidebysig static void  Main(string[] args) cil managed  
  112.   {  
  113.     .entrypoint  
  114.     // Code size       57 (0x39)  
  115.     .maxstack  4  
  116.     .locals init ([0] class MathFun obj)  
  117.     IL_0000:  nop  
  118.     IL_0001:  ldstr      "Ajay"  
  119.     IL_0006:  newobj     instance void MathFun::.ctor(string)  
  120.     IL_000b:  stloc.0  
  121.     IL_000c:  ldloc.0  
  122.     IL_000d:  callvirt   instance string MathFun::Display()  
  123.     IL_0012:  call       void [mscorlib]System.Console::WriteLine(string)  
  124.     IL_0017:  nop  
  125.     IL_0018:  ldstr      "Addition is: {0}"  
  126.     IL_001d:  ldloc.0  
  127.     IL_001e:  ldc.i4.s   15  
  128.     IL_0020:  ldc.i4.s   35  
  129.     IL_0022:  callvirt   instance int32 MathFun::Addition(int32,int32)  
  130.     IL_0027:  box        [mscorlib]System.Int32  
  131.     IL_002c:  call        void [mscorlib]System.Console::WriteLine(string,object)  
  132.     IL_0031:  nop  
  133.     IL_0032:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()  
  134.     IL_0037:  pop  
  135.     IL_0038:  ret  
  136.   }   
                                                                                                                MathFun.il
 
Now build this program using F8. After successful compilation, the final executable MathFun.exe file is created in the project Bin/Debug folder of the solution directory.
 
Assembly Directives
 
The assembly directive contains information that the compiler produces to the manifest, that is metadata pertaining to the overall assembly. This section lists common assembly directives as in the following:
 
.assembly extern
 
This directive represents an external assembly. The public types and methods of the referenced assembly are available to the current assembly. Here is the syntax:
 
.assembly extern name as alaisname { }
 
We implement such construct in the MathFun.il file by referencing the mscorlib.dll as in the following:
  1. .assembly extern mscorlib  
  2. {  
  3.   .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                           
  4.   .ver 2:0:0:0  
Because of the importance of mscorlib.dll, the ILASM compiler automatically includes an external assembly reference to that library.
 
.assembly
 
It defines the simple name of the assembly. The assembly can be defined by specifying the friendly name of the binary as in the following:
 
.assembly CILType { }
 
There are some of the sub-directives available in the assembly block as in the following:
  • .ver
  • .locale
  • .publickey
By taking the reference of the MathFun.il file, we are updating the assembly definition to include a version number of 1.0.0.0 using the .ver directive and culture information using .locale; such construction would be as in the following:
  1. .assembly MathFun  
  2. {  
  3.   .ver 1:0:0:0  
  4.   .locale "en.US"  
.module
 
The .module directive ensures the final executable extension of the files, such as *.exe as in the following:
 
.module MathFun.exe
 
.imagebase
 
The .imagebase directive sets the base address where the application is loaded. The default is 0x00400000.
 
.imagebase 0x00400000
 
.file
 
The .file directive adds a file to the manifest of the assembly. This is useful for associating helper documents with an assembly.
 
.file alignment 0x00000200
 
The nometadata is the primary option and stipulates that the file is unmanaged.
 
.stackreserve
 
The .stackreserve directive configures the stack size to 0x00100000 which is the default.
 
.stackreserve 0x00100000
 
.subsystem
 
The .subsystem directive indicates the subsystem used by the application, such as a console or GUI subsystem. Here the syntax is as in the following:
 
.subsystem number
 
Specify 3 for console applications and 2 for GUI applications. So in the following, we are constructing a console application.
 
.subsystem 0x0003
 
.corflags
 
The .corflags directive sets the runtime flag in the CLI header that stipulates an IL only assembly. The default value is 1 for the corflags.
 
.corflags 0x00000003 (As reference to MathFun.il)
 
.maxstack
 
The .maxstack directive establishes the maximum number of variables that may be pushed onto the stack during execution.
 
.maxstack 8 (default value)
 
Class Directives
 
This part describes the important class directives. It contains the following significant directive.
 
.class
 
The .class directive defines a new reference, value, or interface type. Here, the syntax is as in the following:
 
attributes classname extends basetype implements interface
 
As in the previous MathTest.il file, we implement the class MathOperation using the .class directive in this way as in the following:
 
.class public auto ansi beforefieldinit MathFun
 
extends [mscorlib]System.Object
 
The class directive is also adorned with a variety of attributes. Here is a short list of the most common.
  • abstract: indicate class can't instantiated.
  • ansi and Unicode: determine the format of the string.
  • auto : CLR controlled the Memory layout of fields by this.
  • beforefieldinit: the type should be initialized before a static class is accessed.
  • private and public: set the visibility outside the class
The Test class also implements a constructor specification as Test() in order to initialize the field data as in the C# version.
  1. public Test(string name)  
  2.         {  
  3.             this.Name = name;   
  4.         } 
So its IL code would be as follows:
  1. .field private string '<Name>k__BackingField'  
  2.   .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )   
  3.   .method public hidebysig specialname rtspecialname   
  4.           instance void  .ctor(string name) cil managed  
  5.   {  
  6.     // Code size       18 (0x12)  
  7.     .maxstack  8  
  8.     IL_0000:  ldarg.0  
  9.     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()  
  10.     IL_0006:  nop  
  11.     IL_0007:  nop  
  12.     IL_0008:  ldarg.0  
  13.     IL_0009:  ldarg.1  
  14.     IL_000a:  call       instance void MathFun::set_Name(string)  
  15.     IL_000f:  nop  
  16.     IL_0010:  nop  
  17.     IL_0011:  ret  
  18.   }  
.property
 
The property directive adds a property member to a class. Here, the syntax is as in the following:
 
.property attributes return propertyname parametrs default { body }
 
If we define a property in C# code as in the following:
  1. public String Name  
  2.         {  
  3.             get;  
  4.             set;  
  5.         } 
Then its corresponding MSIL code counterpart for Get and Set property would be as in the following:
  1. .method public hidebysig specialname instance string   
  2.         get_Name() cil managed  
  3. {  
  4.   .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )   
  5.   // Code size       11 (0xb)  
  6.   .maxstack  1  
  7.   .locals init (string V_0)  
  8.   IL_0000:  ldarg.0  
  9.   IL_0001:  ldfld      string MathFun::'<Name>k__BackingField'  
  10.   IL_0006:  stloc.0  
  11.   IL_0007:  br.s       IL_0009  
  12.   
  13.   IL_0009:  ldloc.0  
  14.   IL_000a:  ret  
  15. // end of method Test::get_Name  
  16.   
  17. .method public hidebysig specialname instance void   
  18.         set_Name(string 'value') cil managed  
  19. {  
  20.   .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )   
  21.   // Code size       8 (0x8)  
  22.   .maxstack  8  
  23.   IL_0000:  ldarg.0  
  24.   IL_0001:  ldarg.1  
  25.   IL_0002:  stfld      string MathFun::'<Name>k__BackingField'  
  26.   IL_0007:  ret  
  27. }   
  28. roperty instance string Name()  
  29. {  
  30.   .get instance string MathFun::get_Name()  
  31.   .set instance void MathFun::set_Name(string)  
.method
 
This directive defines the method in a class. Here is the syntax.
 
.method attributes callingconv return methodname arguments { body }
 
We are defining two methods Display() and Addition(). The first one would show “Hello” text on the screen and the second Addition() method would compute the sum of two integer types supplied variables in the method as in the following.
  1. public String Display()  
  2.         {  
  3.             return "Hello " + Name;   
  4.         }  
  5. public int Addition(int x, int y)  
  6.         {  
  7.             return (x+y);  
  8.         } 
The corresponding IL code for the methods is as in the following:
  1. .method public hidebysig instance string   
  2.           Display() cil managed  
  3.   {  
  4.     // Code size       22 (0x16)  
  5.     .maxstack  2  
  6.     .locals init ([0] string CS$1$0000)  
  7.     IL_0000:  nop  
  8.     IL_0001:  ldstr      "Hello "  
  9.     IL_0006:  ldarg.0  
  10.     IL_0007:  call       instance string MathFun::get_Name()  
  11.     IL_000c:  call       string [mscorlib]System.String::Concat(string,  
  12.                                                                 string)  
  13.     IL_0011:  stloc.0  
  14.     IL_0012:  br.s       IL_0014  
  15.   
  16.     IL_0014:  ldloc.0  
  17.     IL_0015:  ret  
  18.   }  
The method attribute has some additional attributes as in the following:
  • hidebysig : hides the base class interface of this method.
  • Specialname: this is used for special methods such as get_Property and set_Property.
  • Rtspecialname : this indicates the special method referred to as a constructor.
  • Cil or il : the method contains the MSIL code.
  • Native: the method contains platform-specific code.
  • Managed : indicate the implementation is managed.
.field
 
The field directive indicates a newly defined field that is state information for a class. Here, the syntax is as in the following:
 
.field attributes type fieldname
 
In the C# code, we can define an integer type field as in the following:
 
.field private initonly int32 x
.field private initonly int32 y
 
Main() Method Directives
 
The method block can contain both directives and the implementation code (CIL).
 
.entrtpoint
 
This directive designates a method as an entry point of the application. This directive can be shown anywhere in the program.
 
.locals
 
The .locals directive declares the local variables that are available by name. Here, we are defining two integer type local variables in the MathFun.il as in the following:
 
.locals init ([0] int32 x,[1] int32 y)
 
And we are assigning a string slot by also passing a string data into the class constructor as in the following:
 
.locals init ([0] class MathFun obj)
 
MSIL Instructions
 
Each MSIL instruction is assigned an opcode that is commonly 1 or 2 bytes. The opcode that caters to an alternative means of identifying MSIL instructions are used primarily when producing code dynamically at run time.
  1. IL_0000: nop  
  2. IL_0001: ldstr "Ajay"  
  3. IL_0006: newobj instance void MathFun::.ctor(string)  
  4. IL_000b: stloc.0  
  5. IL_000c: ldloc.0  
  6. IL_000d: callvirt instance string MathFun::Display()  
  7. IL_0012: call void [mscorlib]System.Console::WriteLine(string)  
  8. IL_0017: nop  
  9. IL_0018: ldstr "Addition is: {0}"  
  10. IL_001d: ldloc.0  
  11. IL_001e: ldc.i4.s 15  
  12. IL_0020: ldc.i4.s 35  
  13. IL_0022: callvirt instance int32 MathFun::Addition(int32,  
  14. int32)  
  15. IL_0027: box [mscorlib]System.Int32  
  16. IL_002c: call void [mscorlib]System.Console::WriteLine(string,  
  17. object)  
  18. IL_0031: nop  
  19. IL_0032: call valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()  
  20. IL_0037: pop  
  21. IL_0038: ret 
Synopsis
 
This article has briefly touched on the most important features of the common language runtime and ILAsm. You now know how the runtime functions, how a program in ILAsm is written and compiled using either the ilasm or Xamarin Studio and how to define the basic components (classes, fields, property, and methods). We will pick an opcode specification in depth along with the remaining crucial segments of the MSIL grammar in the next articles of this series.