.NET Reverse Engineering: Part 2

This article describes details of the .NET CLR and ILASM coding, including a detailed analysis of each opcode instruction and how to integrate an IL code into an existing high-level C# written code and how to convert an already built C# code into IL code directly.

Before reading this article, I highly recommend reading the previous part:

1. .NET Binary Reverse Engineering: Part 1

Abstract

The first article of this series touched on the most significant aspect of the MSIL code instructions, for instance how is a program authored in ILASM code and how to define the basic components (classes, fields, functions and methods). In this article, we shall encounter use of rudimentary IL code, various data types, opcode instructions and remaining sophisticated features (interfaces, boxing and branching) of the .NET CLR and ILASM coding. After that, we"ll get an understanding of the detailed analysis of each opcode instruction, and how to integrate an IL code into an existing high-level C# written code and how to convert an already built C# code into IL code directly, in order to free the programmer from writing complex IL instruction code.

CIL Data Types

CIL also has data types like other high-level languages for mapping the data into their specific compartment. The following table shows how a .NET base class type that maps to the equivalent C# keyword and how each C# keyword maps into CIL codes with a constant.



MSIL Code Labels

Perhaps you would have noticed in the earlier article sample code in which each line of implementation is prefixed or annotated with a special token of the form IL_XXX (for example, IL_0000, IL _0002). These tokens are called code labels and they are completely optional, can be named in any manner. When we dump the assembly source code file using ILDASM.exe, it will automatically generate code labels. However, you may change them to make the code more descriptive. We can extract the token information from an assembly using the following command:

ILDASM /Token test.exe

This command produces the corresponding token information with IL_XXXX as in the following:
  1. .method /*06000001*/ private hidebysig static   
  2.         void  Main(string[] args) cil managed  
  3. {  
  4.   .entrypoint  
  5.   .maxstack  2  
  6.   .locals init ([0] string str)  
  7.   
  8.   IL_0000:  nop  
  9.   IL_0001:  ldstr      "Ajay"  
  10.   IL_0006:  stloc.0  
  11.   IL_0007:  ldstr      "Hello"   
  12.   IL_000c:  ldloc.0  
  13.   IL_000d:  call       string   
  14.   IL_0012:  call       void   
  15.   IL_0017:  nop  
  16.   IL_0018:  ret  
  17. }  

We can transform the label information into more descriptive information as in the following (it doesn't matter what information we are putting in the label information because they are optional):

  1. Nothing_1     :  nop  
  2. Load String   :  ldstr      "Ajay"  
  3. Memory_Loca1  :  stloc.0  
  4. Load Constant :  ldstr      "Hello"   
  5. Memory_Loca2  :  ldloc.0  
  6. Print_console :  call       string   
  7. Call Method   :  call       void   
  8. Nothing_2     :  nop  
  9. Leave Function:  ret 

MSIL Opcodes

This section will explain various MSIL instructions that are generally termed operation codes (opcodes). Some of the instructions have already appeared in previous article sample code but they have not been reviewed in detail so far. Opcodes are typically CIL tokens used to build the implementation logic, such as if you need to load a string variable into memory, you need to use the ldstr opcode rather than the friendly Load Function. The complete set of CIL opcodes can be grouped into the following three broad segments:

  • Retrieve Instructions
  • Control Instructions
  • Operations Instructions





Detailed Analysis of Opcode Instructions

We have concentrated on individual opcode instructions up until now to understand each opcode instruction's meaning in detail, we therefore are presenting some complex sample code that encapsulates numerous tasks such as executing a loop, creating new class types and so on. Basically our prime motive is to encounter multiple instructions sets.

The following C# code does an addition of two local integer variables:

  1. public int Operation(int a,int b)  
  2.  {  
  3.     return (a + b);  
  4.  } 

Now the preceding code will be converted into its corresponding CIL code and interpreted in opcode terminology as in the following:

  1. .method public hidebysig instance int32 Operation(int32 a,int32 b) cil managed  
  2.   {  
  3.     .maxstack  2  
  4.   
  5. // Initialize the Local variable “a” and “b”  
  6.     .locals init ([0] int32 a,[0] int32 b)   
  7.   
  8. // Blank Instruction, no operation  
  9.     IL_0000:  nop   
  10.   
  11. // Loading a,b into memory                               
  12.     IL_0001:  ldarg.1                                
  13.     IL_0002:  ldarg.2  
  14.   
  15. // Performing Addition of a and b                                
  16.     IL_0003:  add      
  17.   
  18. // Store the calculated value at index ‘0’  
  19.                                  
  20.     IL_0004:  stloc.0                            
  21.   
  22. // Jump to IL_0007 instruction       
  23.     IL_0005:  br.s       IL_0007                 
  24. // Load this value at index ‘0’  
  25.     IL_0007:  ldloc.0        
  26.   
  27. // exiting from Method   
  28.                           
  29.     IL_0008:  ret                                         
  30. }  

Branching

The iteration is done using for, foreach and while loop constructs in the C# programming language. Here, the following C# code simply executes a for loop 7 times and adds all the numbers from 1 to 5 until the loop local variable reaches 5 as in the following:

  1. public int braching()  
  2.   {  
  3.     int x = 0;  
  4.     for (int i =0;i<7;i++)  
  5.       {  
  6.         x = x + i;  
  7.         if (i == 5)  
  8.            break;  
  9.       }  
  10.             return x;   
  11.    } 

Here, the blt, br and bgt opcodes are used to create a break in the flow when some condition has been met. Here, the CIL opcode labels would be interpreted as in the following;

  1. .method public hidebysig instance void braching() cil managed  
  2.   {  
  3.     .maxstack  2  
  4.   
  5.     .locals init ([0] int32 x, [1] int32 i, [2] bool CS$4$0000)    
  6.     IL_0000:  nop                                                  
  7.     // Load “x” value into memory                                
  8.     IL_0001:  ldc.i4.0                                                               
  9.     // Store ‘x’ value at index ‘0’  
  10.     IL_0002:  stloc.0                                                                
  11.     // Load “i” value into memory  
  12.     IL_0003:  ldc.i4.0                                                               
  13.     // Store ‘i’ value at index ‘1’  
  14.     IL_0004:  stloc.1                                                             
  15.     // Jump to IL_001e instruction     
  16.     IL_0005:  br.s       IL_001e                                                  
  17.   
  18.     IL_0007:  nop                                                                    
  19.     // Load Value of ‘x’ variable at index ‘0’  
  20.     IL_0008:  ldloc.0                                                                
  21.     // Load Value of ‘i’ variable at index ‘1’     
  22.     IL_0009:  ldloc.1                                                                
  23.     // Add current value on the memory at index ‘0’  
  24.     IL_000a:  add                                                                     
  25.     // Store addition value in the local variable 0          
  26.     IL_000b:  stloc.0        
  27.     // Load value of local variable 1 in memory                                                           
  28.     IL_000c:  ldloc.1                                                                 
  29.     // Load Integer value 5 into memory  
  30.     IL_000d:  ldc.i4.5                                                                
  31.     //Test the Equality  
  32.     IL_000e:  ceq                                                                     
  33.     // Load integer value 0 into memory      
  34.     IL_0010:  ldc.i4.0                                                                
  35.     // compare two variables  
  36.     IL_0011:  ceq                                                                     
  37.     // Retrieve value from memory and store in variable 2  
  38.     IL_0013:  stloc.2                                                                 
  39.     // Load value of local variable 2 on memory  
  40.     IL_0014:  ldloc.2                                                           
  41.     // branch to IL_0019        
  42.     IL_0015:  brtrue.s   IL_0019                                                  
  43.     // Jump to IL_0026 instruction  
  44.     IL_0017:  br.s       IL_0026                                                   
  45.     // No Instruction  
  46.     IL_0019:  nop                                                                      
  47.     // Load value of local variable 1 on memory  
  48.     IL_001a:  ldloc.1                                                                  
  49.     // Load Integer value 0 into memory  
  50.     IL_001b:  ldc.i4.1                                                                 
  51.     // Perform Addition  
  52.     IL_001c:  add                                                                      
  53.     // Store Addition value in the local variable 1  
  54.     IL_001d:  stloc.1                                                                  
  55.     // Load value of local variable 1 in memory  
  56.     IL_001e:  ldloc.1                                                                  
  57.     // Load integer value 7 into memory  
  58.     IL_001f:  ldc.i4.7                                                                  
  59.     // compare less than  
  60.     IL_0020:  clt                                                                      
  61.     // Get value from stack and store in variable 2    
  62.     IL_0022:  stloc.2                                                                  
  63.     // Load the value of 2 in memory     
  64.     IL_0023:  ldloc.2   
  65.     // branch to IL_0007 ( if the integer value is non-zero)                                                                  
  66.     IL_0024:  brtrue.s   IL_0007                                                 
  67.     // Load the value of 0 in memory  
  68.     IL_0026:  ldloc.0         
  69.   
  70.     // Calling Console.WriteLine() method                                                           
  71.     IL_0027:  call       void [mscorlib]System.Console::WriteLine(int32)       
  72.     IL_002c:  nop                                                                                          
  73.     IL_002d:  ret                                                                                           
  74.   }  

Boxing

Boxing is the process of explicitly assigning a value type to a Reference type (System. Object). When we box a value, the CLR allocates a new object on the heap and copies the values 10 into the instance. The opposite operation is unboxing that converts a value held in the reference back into the corresponding value type as in the following:

  1. static void BoxUnbox()  
  2. {  
  3.      int x = 10;  
  4.      //Boxed  
  5.   
  6.      object bObj = x;  
  7.      //Unboxed  
  8.   
  9.      int y = (int)bObj;  
  10.      Console.WriteLine(y);   

If you examine your compiled code using the ILDASM then you will find the boxing and unboxing entries in the CIL code as in the following:

  1. .method private hidebysig static void  BoxUnbox() cil managed  
  2. {  
  3.   .maxstack  1  
  4.   .locals init ([0] int32 x,[1] object bObj,[2] int32 y)                               
  5.   IL_0000:  nop      
  6.   
  7.   // Load Integer value 10 into memory                                                                                  
  8.   IL_0001:  ldc.i4.s   10   
  9.   // Store “x” value into local variable 0                                                                        
  10.   IL_0003:  stloc.0       
  11.   // Load the value of 0 onto memory   
  12.   IL_0004:  ldloc.0       
  13.     
  14.   //Boxing (value to object)         
  15.   IL_0005:  box        [mscorlib]System.Int32                                     
  16.     
  17.   // Store bObj value into local variable 1    
  18.   IL_000a:  stloc.1     
  19.   // Load the value of 1 onto memory                                                                               
  20.   IL_000b:  ldloc.1    
  21.   
  22.   //Unboxing (object to value)                                                                                
  23.   IL_000c:  unbox.any  [mscorlib]System.Int32       
  24.     
  25.   // Store “y” value into local variable 2                           
  26.   IL_0011:  stloc.2   
  27.   // Load the value of 2 onto memory                                                                                  
  28.   IL_0012:  ldloc.2                                                                                   
  29.   IL_0013:  call       void [mscorlib]System.Console::WriteLine(int32)       
  30.   IL_0018:  nop                                                                                        
  31.   IL_0019:  ret                                                                                          
  32. }  

Interface

An interface can be defined in the MSIL using the interface keyword directly. Fields are not allowed in an interface and the member function must be public, abstract and virtual. A class uses the implements keyword to list interface that must be implemented as in the following:

  1. .assembly CILComplexTest  
  2. {  
  3. }  
  4. .assembly extern mscorlib  
  5. {  
  6.   .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                    
  7.   .ver 4:0:0:0  
  8. }  
  9. // Interface Definition  
  10. .class interface public abstract auto ansi CILComplexTest.Repository  
  11. {  
  12.   .method public hidebysig newslot abstract virtual  instance void  Display() cil managed  
  13.   {  
  14.   } // end of method Repository::Display  
  15.   
  16. // end of class CILComplexTest.Repository  
  17.   
  18. // Display() method  
  19. .class public auto ansi beforefieldinit CILComplexTest.test  extends [mscorlib]System.Object  
  20.                                                                  implements CILComplexTest.Repository  
  21. {  
  22.   .method public hidebysig newslot virtual final  instance void  Display() cil managed  
  23.   {  
  24.     // Code size       13 (0xd)  
  25.     .maxstack  8  
  26.     IL_0000:  nop  
  27.     IL_0001:  ldstr      "Hello"  
  28.     IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)  
  29.     IL_000b:  nop  
  30.     IL_000c:  ret  
  31.   } // end of method test::Display  
  32.   
  33. // Main class  
  34. .class private auto ansi beforefieldinit CILComplexTest.Program extends [mscorlib]System.Object  
  35. {  
  36.   .method private hidebysig static void  Main(string[] args) cil managed  
  37.   {  
  38.     .entrypoint  
  39.     // Code size       13 (0xd)  
  40.     .maxstack  8  
  41.     IL_0000:  nop  
  42.     IL_0001:  newobj     instance void CILComplexTest.test::.ctor()  
  43.     IL_0006:  call       instance void CILComplexTest.test::Display()  
  44.     IL_000b:  nop  
  45.     IL_000c:  ret  
  46.   } // end of method Program::Main  
  47.   
  48. //constructor  
  49.   .method public hidebysig specialname rtspecialname  instance void  .ctor() cil managed  
  50.   {  
  51.     // Code size       7 (0x7)  
  52.     .maxstack  8  
  53.     IL_0000:  ldarg.0  
  54.     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()  
  55.     IL_0006:  ret  
  56.   } // end of method Program::.ctor  
  57. }  

MSIL Code Generation

The .NET framework offers a utility ILDASM.exe to convert the existing C# code into MSIL code to spare the hassle of manually writing the CIL code that is deemed as one of the most error-prone tasks because each instruction is bizarre in terms of syntax specification and stipulates a different meaning.

Suppose we are writing a program using a CIL opcode instruction in which we want to simply flash a “Hello Ajay” message over the screen. Despite the simple nature of the program, there are still many complications when we choose the MSIL programming language for executing the instructions because MSIL opcode instructions are not in a user-friendly English language format. However, there is a trick. First write the instruction code implementation using a user-friendly C# language and compile this project file. The corresponding executable is created in the Bin/Debug folder.

  1. using System;  
  2.   
  3. namespace CILComplexTest  
  4. {      
  5.     class xyz  
  6.     {  
  7.         private string msg;  
  8.         public xyz(string msg)  
  9.         {  
  10.             this.msg = msg;  
  11.         }  
  12.   
  13.         public string display()  
  14.         {  
  15.             return "Hello " + msg;  
  16.         }  
  17.   
  18.     }  
  19.     class Program  
  20.     {  
  21.         static void Main(string[] args)  
  22.         {  
  23.             xyz obj = new xyz("Ajay");  
  24.             Console.WriteLine(obj.display());   
  25.               
  26.         }  
  27.     }  

Now, open the Visual Studio Command prompt and go the project Bin/Debug folder and execute this command to convert this existing C# code instruction into MSIL code as in the following:

ILDASM CILComplexTest.exe /out:test.il



Notice that the test.il file has been created in the Bin/Debug folder that has the same instruction set implementations as its C# counterpart code. Now just open this file using any editor and compile it using the ILASM utility. Here, the automatically generated IL code is as in the following:

  1. //Microsoft (R) .NET Framework IL Disassembler.  Version 4.0.30319.1  
  2. //Copyright (c) Microsoft Corporation.  All rights reserved.  
  3. //Metadata version: v4.0.30319  
  4.   
  5. .assembly extern mscorlib  
  6. {  
  7.   .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                           
  8.   .ver 4:0:0:0  
  9. }  
  10. .assembly CILComplexTest  
  11. {  
  12. ..  
  13.   .hash algorithm 0x00008004  
  14.   .ver 1:0:0:0  
  15. }  
  16. .module CILComplexTest.exe  
  17. // MVID: {631F60E4-6E43-4355-BC70-DAF16F1FE33A}  
  18. .imagebase 0x00400000  
  19. .file alignment 0x00000200  
  20. .stackreserve 0x00100000  
  21. .subsystem 0x0003       // WINDOWS_CUI  
  22. .corflags 0x00000003    //  ILONLY 32BITREQUIRED  
  23. // Image base: 0x003E0000  
  24.   
  25.   
  26. // =============== CLASS MEMBERS DECLARATION ===================  
  27.   
  28. .class private auto ansi beforefieldinit CILComplexTest.xyz  
  29.                              extends [mscorlib]System.Object  
  30. {  
  31.   .field private string msg  
  32.   .method public hidebysig specialname rtspecialname instance void  .ctor(string msg) cil managed  
  33.   {  
  34.     // Code size       17 (0x11)  
  35.     .maxstack  8  
  36.     IL_0000:  ldarg.0  
  37.     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()  
  38.     IL_0006:  nop  
  39.     IL_0007:  nop  
  40.     IL_0008:  ldarg.0  
  41.     IL_0009:  ldarg.1  
  42.     IL_000a:  stfld      string CILComplexTest.xyz::msg  
  43.     IL_000f:  nop  
  44.     IL_0010:  ret  
  45.   } // end of method xyz::.ctor  
  46.   
  47.   .method public hidebysig instance string display() cil managed  
  48.   {  
  49.     // Code size       22 (0x16)  
  50.     .maxstack  2  
  51.     .locals init (string V_0)  
  52.     IL_0000:  nop  
  53.     IL_0001:  ldstr      "Hello "  
  54.     IL_0006:  ldarg.0  
  55.     IL_0007:  ldfld      string CILComplexTest.xyz::msg  
  56.     IL_000c:  call       string [mscorlib]System.String::Concat(string,string)  
  57.     IL_0011:  stloc.0  
  58.     IL_0012:  br.s       IL_0014  
  59.   
  60.     IL_0014:  ldloc.0  
  61.     IL_0015:  ret  
  62.   } // end of method xyz::display  
  63.   
  64. // end of class CILComplexTest.xyz  
  65.   
  66. .class private auto ansi beforefieldinit CILComplexTest.Program  
  67.        extends [mscorlib]System.Object  
  68. {  
  69.   .method private hidebysig static void  Main(string[] args) cil managed  
  70.   {  
  71.     .entrypoint  
  72.     // Code size       31 (0x1f)  
  73.     .maxstack  2  
  74.     .locals init (class CILComplexTest.xyz V_0)  
  75.     IL_0000:  nop  
  76.     IL_0001:  ldstr      "Ajay"  
  77.     IL_0006:  newobj     instance void CILComplexTest.xyz::.ctor(string)  
  78.     IL_000b:  stloc.0  
  79.     IL_000c:  ldloc.0  
  80.     IL_000d:  callvirt   instance string CILComplexTest.xyz::display()  
  81.     IL_0012:  call       void [mscorlib]System.Console::WriteLine(string)  
  82.     IL_0017:  nop  
  83.     IL_0018:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()  
  84.     IL_001d:  pop  
  85.     IL_001e:  ret  
  86.   } // end of method Program::Main  
  87.   
  88.   .method public hidebysig specialname rtspecialname instance void  .ctor() cil managed  
  89.   {  
  90.     // Code size       7 (0x7)  
  91.     .maxstack  8  
  92.     IL_0000:  ldarg.0  
  93.     IL_0001:  call       instance void [mscorlib]System.Object::.ctor()  
  94.     IL_0006:  ret  
  95.   } // end of method Program::.ctor  
  96.   
  97. // end of class CILComplexTest.Program 

Summary

This article has provided an overview of the various CIL Data Type syntaxes and opcode instructions. We had seen a detailed analysis of each instruction opcdoe meaning. We have also seen the rest of the complex types such boxing, unboxing, branching and interfaces in the form of CIL opcodes. Finally, you took an introductory look at the process of converting an existing C# source code file into MSIL opcode instructions using the ILDASM utility.