.NET Reverse Engineering: Part 2

Before reading this article, I highly recommend reading the previous part.

1.NET Binary Reverse Engineering: Part 1

Abstract

The first article of this series touched on the most significant aspect of the MSIL code instructions, for instance, how is a program authored in ILASM code and how to define the basic components (classes, fields, functions, and methods). In this article, we shall encounter the use of rudimentary IL code, various data types, opcode instructions, and remaining sophisticated features (interfaces, boxing, and branching) of the .NET CLR and ILASM coding. After that, we"'ll get an understanding of the detailed analysis of each opcode instruction, and how to integrate an IL code into an existing high-level C# written code, and how to convert an already built C# code into IL code directly, in order to free the programmer from writing complex IL instruction code.

CIL Data Types

CIL also has data types like other high-level languages for mapping the data into their specific compartment. The following table shows how a .NET base class type maps to the equivalent C# keyword and how each C# keyword maps into CIL codes with a constant.

CIL Data types

MSIL Code Labels

Perhaps you would have noticed in the earlier article sample code in which each line of implementation is prefixed or annotated with a special token of the form IL_XXX (for example, IL_0000, IL _0002). These tokens are called code labels and they are completely optional and can be named in any manner. When we dump the assembly source code file using ILDASM.exe, it will automatically generate code labels. However, you may change them to make the code more descriptive. We can extract the token information from an assembly using the following command.

ILDASM /Token test.exe

This command produces the corresponding token information with IL_XXXX as in the following.

.method /*06000001*/ private hidebysig static 
    void Main(string[] args) cil managed
{
    .entrypoint
    .maxstack 2
    .locals init ([0] string str)
    
    IL_0000: nop
    IL_0001: ldstr "Ajay"
    IL_0006: stloc.0
    IL_0007: ldstr "Hello"
    IL_000c: ldloc.0
    IL_000d: call string
    IL_0012: call void
    IL_0017: nop
    IL_0018: ret
}

We can transform the label information into more descriptive information as in the following (it doesn't matter what information we are putting in the label information because they are optional).

  1. Nothing_1: nop
  2. Load String: ldstr Ajay
  3. Memory_Loca1: stloc.0
  4. Load Constant: ldstr Hello
  5. Memory_Loca2: ldloc.0
  6. Print_console: call string
  7. Call Method: call void
  8. Nothing_2: nop
  9. Leave Function: ret

MSIL Opcodes

This section will explain various MSIL instructions that are generally termed operation codes (opcodes). Some of the instructions have already appeared in the previous article sample code but they have not been reviewed in detail so far. Opcodes are typically CIL tokens used to build the implementation logic, such as if you need to load a string variable into memory, you need to use the ldstr opcode rather than the friendly Load Function. The complete set of CIL opcodes can be grouped into the following three broad segments.

  • Retrieve Instructions
  • Control Instructions
  • Operations Instructions

Retrieve Instructions

Operations Instructions

Detailed Analysis of Opcode Instructions

We have concentrated on individual opcode instructions up until now to understand each opcode instruction's meaning in detail, we, therefore, are presenting some complex sample code that encapsulates numerous tasks such as executing a loop, creating new class types, and so on. Basically, our prime motive is to encounter multiple instruction sets.

The following C# code does the addition of two local integer variables.

public int Operation(int a, int b)
{
    return (a + b);
}

Now the preceding code will be converted into its corresponding CIL code and interpreted in opcode terminology as in the following.

.method public hidebysig instance int32 Operation(int32 a, int32 b) cil managed
{
    .maxstack 2

    // Initialize the Local variable “a” and “b”
    .locals init ([0] int32 a, [0] int32 b)

    // Blank Instruction, no operation
    IL_0000: nop

    // Loading a, b into memory
    IL_0001: ldarg.1
    IL_0002: ldarg.2

    // Performing Addition of a and b
    IL_0003: add

    // Store the calculated value at index ‘0’
    IL_0004: stloc.0

    // Jump to IL_0007 instruction
    IL_0005: br.s IL_0007

    // Load this value at index ‘0’
    IL_0007: ldloc.0

    // exiting from Method
    IL_0008: ret
}

Branching

The iteration is done using for, foreach, and while loop constructs in the C# programming language. Here, the following C# code simply executes a for loop 7 times and adds all the numbers from 1 to 5 until the loop local variable reaches 5 as in the following.

public int braching()
{
    int x = 0;
    for (int i = 0; i < 7; i++)
    {
        x = x + i;
        if (i == 5)
            break;
    }
    return x;
}

Here, the blt, be, and bgt opcodes are used to create a break in the flow when some condition has been met. Here, the CIL opcode labels would be interpreted as the following.

.method public hidebysig instance void braching() cil managed
{
    .maxstack 2

    .locals init ([0] int32 x, [1] int32 i, [2] bool CS$4$0000)
    IL_0000: nop

    // Load “x” value into memory
    IL_0001: ldc.i4.0
    // Store ‘x’ value at index ‘0’
    IL_0002: stloc.0
    // Load “i” value into memory
    IL_0003: ldc.i4.0
    // Store ‘i’ value at index ‘1’
    IL_0004: stloc.1
    // Jump to IL_001e instruction
    IL_0005: br.s IL_001e

    IL_0007: nop
    // Load Value of ‘x’ variable at index ‘0’
    IL_0008: ldloc.0
    // Load Value of ‘i’ variable at index ‘1’
    IL_0009: ldloc.1
    // Add current value on the memory at index ‘0’
    IL_000a: add
    // Store addition value in the local variable 0
    IL_000b: stloc.0
    // Load value of local variable 1 in memory
    IL_000c: ldloc.1
    // Load Integer value 5 into memory
    IL_000d: ldc.i4.5
    // Test the Equality
    IL_000e: ceq
    // Load integer value 0 into memory
    IL_0010: ldc.i4.0
    // Compare two variables
    IL_0011: ceq
    // Retrieve value from memory and store in variable 2
    IL_0013: stloc.2
    // Load value of local variable 2 on memory
    IL_0014: ldloc.2
    // branch to IL_0019
    IL_0015: brtrue.s IL_0019
    // Jump to IL_0026 instruction
    IL_0017: br.s IL_0026
    // No Instruction
    IL_0019: nop
    // Load value of local variable 1 on memory
    IL_001a: ldloc.1
    // Load Integer value 0 into memory
    IL_001b: ldc.i4.1
    // Perform Addition
    IL_001c: add
    // Store Addition value in the local variable 1
    IL_001d: stloc.1
    // Load value of local variable 1 in memory
    IL_001e: ldloc.1
    // Load integer value 7 into memory
    IL_001f: ldc.i4.7
    // Compare less than
    IL_0020: clt
    // Get value from stack and store in variable 2
    IL_0022: stloc.2
    // Load the value of 2 in memory
    IL_0023: ldloc.2
    // branch to IL_0007 ( if the integer value is non-zero)
    IL_0024: brtrue.s IL_0007
    // Load the value of 0 in memory
    IL_0026: ldloc.0

    // Calling Console.WriteLine() method
    IL_0027: call void [mscorlib]System.Console::WriteLine(int32)
    IL_002c: nop
    IL_002d: ret
}

Boxing

Boxing is the process of explicitly assigning a value type to a Reference type (System. Object). When we box a value, the CLR allocates a new object on the heap and copies the value 10 into the instance. The opposite operation is unboxing which converts a value held in the reference back into the corresponding value type as in the following.

static void BoxUnbox()
{
    int x = 10;
    // Boxed

    object bObj = x;
    // Unboxed

    int y = (int)bObj;
    Console.WriteLine(y);
}

If you examine your compiled code using the ILDASM then you will find the boxing and unboxing entries in the CIL code as in the following.

.method private hidebysig static void BoxUnbox() cil managed
{
    .maxstack 1
    .locals init ([0] int32 x, [1] object bObj, [2] int32 y)

    IL_0000: nop

    // Load Integer value 10 into memory
    IL_0001: ldc.i4.s 10
    // Store “x” value into local variable 0
    IL_0003: stloc.0
    // Load the value of 0 onto the memory
    IL_0004: ldloc.0

    // Boxing (value to object)
    IL_0005: box [mscorlib]System.Int32

    // Store bObj value into local variable 1
    IL_000a: stloc.1
    // Load the value of 1 onto the memory
    IL_000b: ldloc.1

    // Unboxing (object to value)
    IL_000c: unbox.any [mscorlib]System.Int32

    // Store “y” value into local variable 2
    IL_0011: stloc.2
    // Load the value of 2 onto memory
    IL_0012: ldloc.2

    IL_0013: call void [mscorlib]System.Console::WriteLine(int32)
    IL_0018: nop
    IL_0019: ret
}

Interface

An interface can be defined in the MSIL using the interface keyword directly. Fields are not allowed in an interface and the member function must be public, abstract, and virtual. A class uses the implements keyword to list interface that must be implemented as in the following.

.assembly CILComplexTest
{
}

.assembly extern mscorlib
{
    .publickeytoken = (B77A5C561934E089)
    .ver 4:0:0:0
}

// Interface Definition
.class interface public abstract auto ansi CILComplexTest.Repository
{
    .method public hidebysig newslot abstract virtual instance void Display() cil managed
    {
    }
} // end of class CILComplexTest.Repository

// Display() method
.class public auto ansi beforefieldinit CILComplexTest.test extends [mscorlib]System.Object implements CILComplexTest.Repository
{
    .method public hidebysig newslot virtual final instance void Display() cil managed
    {
        // Code size 13 (0xd)
        .maxstack 8
        IL_0000: nop
        IL_0001: ldstr "Hello"
        IL_0006: call void [mscorlib]System.Console::WriteLine(string)
        IL_000b: nop
        IL_000c: ret
    } // end of method test::Display

} // end of class CILComplexTest.test

// Main class
.class private auto ansi beforefieldinit CILComplexTest.Program extends [mscorlib]System.Object
{
    .method private hidebysig static void Main(string[] args) cil managed
    {
        .entrypoint
        // Code size 13 (0xd)
        .maxstack 8
        IL_0000: nop
        IL_0001: newobj instance void CILComplexTest.test::.ctor()
        IL_0006: call instance void CILComplexTest.test::Display()
        IL_000b: nop
        IL_000c: ret
    } // end of method Program::Main

    // constructor
    .method public hidebysig specialname rtspecialname instance void .ctor() cil managed
    {
        // Code size 7 (0x7)
        .maxstack 8
        IL_0000: ldarg.0
        IL_0001: call instance void [mscorlib]System.Object::.ctor()
        IL_0006: ret
    } // end of method Program::.ctor
} // end of class CILComplexTest.Program

MSIL Code Generation

The .NET framework offers a utility ILDASM.exe to convert the existing C# code into MSIL code to spare the hassle of manually writing the CIL code that is deemed as one of the most error-prone tasks because each instruction is bizarre in terms of syntax specification and stipulates a different meaning.

Suppose we are writing a program using a CIL opcode instruction in which we want to simply flash a “Hello Ajay” message over the screen. Despite the simple nature of the program, there are still many complications when we choose the MSIL programming language for executing the instructions because MSIL opcode instructions are not in a user-friendly English language format. However, there is a trick. First, write the instruction code implementation using a user-friendly C# language and compile this project file. The corresponding executable is created in the Bin/Debug folder.

using System;

namespace CILComplexTest
{
    class xyz
    {
        private string msg;

        public xyz(string msg)
        {
            this.msg = msg;
        }

        public string display()
        {
            return "Hello " + msg;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            xyz obj = new xyz("Ajay");
            Console.WriteLine(obj.display());
        }
    }
}

Now, open the Visual Studio Command prompt go to the project Bin/Debug folder, and execute this command to convert this existing C# code instruction into MSIL code as in the following.

ILDASM CILComplexTest.exe /out:test.il

Command prompt

Notice that the test.il file has been created in the Bin/Debug folder that has the same instruction set implementations as its C# counterpart code. Now just open this file using any editor and compile it using the ILASM utility. Here, the automatically generated IL code is as in the following.

.assembly extern mscorlib
{
    .publickeytoken = (B77A5C561934E089)
    .ver 4:0:0:0
}

.assembly CILComplexTest
{
  .hash algorithm 0x00008004
  .ver 1:0:0:0
}

.module CILComplexTest.exe
// MVID: {631F60E4-6E43-4355-BC70-DAF16F1FE33A}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003 // WINDOWS_CUI
.corflags 0x00000003 // ILONLY 32BITREQUIRED
// Image base: 0x003E0000

// =============== CLASS MEMBERS DECLARATION ===================

.class private auto ansi beforefieldinit CILComplexTest.xyz extends [mscorlib]System.Object
{
  .field private string msg
  .method public hidebysig specialname rtspecialname instance void .ctor(string msg) cil managed
  {
    // Code size 17 (0x11)
    .maxstack 8
    IL_0000: ldarg.0
    IL_0001: call instance void [mscorlib]System.Object::.ctor()
    IL_0006: nop
    IL_0007: nop
    IL_0008: ldarg.0
    IL_0009: ldarg.1
    IL_000a: stfld string CILComplexTest.xyz::msg
    IL_000f: nop
    IL_0010: ret
  } // end of method xyz::.ctor

  .method public hidebysig instance string display() cil managed
  {
    // Code size 22 (0x16)
    .maxstack 2
    .locals init (string V_0)
    IL_0000: nop
    IL_0001: ldstr "Hello "
    IL_0006: ldarg.0
    IL_0007: ldfld string CILComplexTest.xyz::msg
    IL_000c: call string [mscorlib]System.String::Concat(string, string)
    IL_0011: stloc.0
    IL_0012: br.s IL_0014

    IL_0014: ldloc.0
    IL_0015: ret
  } // end of method xyz::display

} // end of class CILComplexTest.xyz

.class private auto ansi beforefieldinit CILComplexTest.Program extends [mscorlib]System.Object
{
  .method private hidebysig static void Main(string[] args) cil managed
  {
    .entrypoint
    // Code size 31 (0x1f)
    .maxstack 2
    .locals init (class CILComplexTest.xyz V_0)
    IL_0000: nop
    IL_0001: ldstr "Ajay"
    IL_0006: newobj instance void CILComplexTest.xyz::.ctor(string)
    IL_000b: stloc.0
    IL_000c: ldloc.0
    IL_000d: callvirt instance string CILComplexTest.xyz::display()
    IL_0012: call void [mscorlib]System.Console::WriteLine(string)
    IL_0017: nop
    IL_0018: call valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
    IL_001d: pop
    IL_001e: ret
  } // end of method Program::Main

  .method public hidebysig specialname rtspecialname instance void .ctor() cil managed
  {
    // Code size 7 (0x7)
    .maxstack 8
    IL_0000: ldarg.0
    IL_0001: call instance void [mscorlib]System.Object::.ctor()
    IL_0006: ret
  } // end of method Program::.ctor

} // end of class CILComplexTest.Program

Summary

This article has provided an overview of the various CIL Data Type syntaxes and opcode instructions. We had seen a detailed analysis of each instruction opcode meaning. We have also seen the rest of the complex types such as boxing, unboxing, branching, and interfaces in the form of CIL opcodes. Finally, you took an introductory look at the process of converting an existing C# source code file into MSIL opcode instructions using the ILDASM utility.