Into The Abyss - From C# To X64 Assembler And Memory Dump

During my preparation for an interview, I was revisiting the concept of structures and classes. It seems like the concept is well understood and even diagrams of the representation of those types on memory can be found. But when I decided to actually look at the memory of the .NET process, I found that there is no article to explain how to do that. This article is a compilation of different articles and my findings in one place.
 
I will concentrate on the Microsoft Console Debugger (cdb.exe) tool and how to use it to see stack memory of managed methods. The same should be possible to achieve via WinDbg.exe, but I didn’t use it. Both are available only on Windows, but for Unix there are alternatives which are not covered here.
 
I’ll be using .NET 5 and the application will be built in Debug configuration. The code will be executed on Windows 10, 64 bit. Hence addresses in memory will take 8 bytes (e.g. 000000dd`9b17e828).
 
If you want to reproduce steps from this article, you need to install cdb tool and SOS extension. Cdb is a part of Debugging Tools for Windows in SDK, follow the guide in documentation to set it up - Debugging Tools for Windows (WinDbg, KD, CDB, NTSD) - Windows drivers.
 
For SOS extension, run dotnettool install -g dotnet-sos. It will install SOS to your user profile and will give you a command to load SOS at the end of the installation: dotnet-sos - .NET Core
 
We've set up the scene. Now, let’s dive in.
 

Hello world and beyond

 
Let’s start with a basic C# program:
  1. class Program {  
  2.     static void Main() {  
  3.         Console.WriteLine("Hello World!");  
  4.         Debugger.Break();  
  5.     }  
  6. }   
We need to build it and then run Console Debugger via command line. First, I will set the current directory to my project output folder to omit the path to my binaries.
 
> “C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\cdb.exe” HelloWorld.exe
Microsoft (R) Windows Debugger Version 10.0.19041.1 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
[...]
 
Debugger will load the necessary modules and stop. At that point CLR is not loaded. We need to use g command to run it until break:
 
0:000> g
ModLoad: 00007ff8`584d0000 00007ff8`584fe000 C:\Windows\System32\IMM32.DLL
[...]
Hello World!
(3f28.2c64): Break instruction exception - code 80000003 (first chance)
KERNELBASE!wil::details::DebugBreak+0x2:
00007ff8`572e1a52 cc int 3
 
As you can see it displayed Hello World and stopped at Debugger.Break(). Now we can load the SOS extension (it’s possible only after CLR has loaded) and we are ready to explore our little application memory.
 
0:000>.load C:\Users\eugen\.dotnet\sos\sos.dll
 

Call stack and stack memory 

 
With k command we can explore unmanaged call stack (note 0x00007ff7`b2685f01 in call site for our managed function)
 
0:000> k
Child-SP RetAddr Call Site
000000dd`9b17e828 00007ff8`12297c89 KERNELBASE!wil::details::DebugBreak+0x2
000000dd`9b17e830 00007ff8`00989e3a coreclr!DebugDebugger::Break+0x149
000000dd`9b17e9b0 00007ff7`b2685f01 System_Private_CoreLib!System.Diagnostics.Debugger.Break()$##6005841+0xa
000000dd`9b17e9e0 00007ff8`121c9e33 0x00007ff7`b2685f01
000000dd`9b17ea10 00007ff8`1210869c coreclr!CallDescrWorkerInternal+0x83
000000dd`9b17ea50 00007ff8`121112ab coreclr!MethodDescCallSite::CallTargetWorker+0x268
(Inline Function) --------`-------- coreclr!MethodDescCallSite::Call+0xb
000000dd`9b17eb90 00007ff8`12111076 coreclr!RunMainInternal+0x11f
000000dd`9b17ecc0 00007ff8`12110bd9 coreclr!RunMain+0xd2
000000dd`9b17ed70 00007ff8`121515c8 coreclr!Assembly::ExecuteMainMethod+0x1cd
[...]
 
We can explore managed stack with !clrstack command from SOS extension (or !clrstack -a -i if you want also see parameters and locals)
 
0:000> !clrstack
OS Thread Id: 0x2c64 (0)
Child SP IP Call Site
000000DD9B17E8A8 00007ff8572e1a52 [HelperMethodFrame: 000000dd9b17e8a8] System.Diagnostics.Debugger.BreakInternal()
000000DD9B17E9B0 00007FF800989E3A System.Diagnostics.Debugger.Break()
000000DD9B17E9E0 00007FF7B2685F01 HelloWorld.Program.Main()
 
We see here Child SP (Stack Pointer) and IP (Instruction Pointer) for HelloWorld.Program.Main() method. Those are pointers to the top of the stack and the next instruction address in method context. It’s important to note that managed stack is a subset of unmanaged stack, hence we can use commands not only from the SOS extension to explore .NET application.
 
Now we have an SP that points to the top of the stack memory for the Main() function. But we don’t know where it starts. One way to find that  out is to rely on cdb tool and use .frame /r command to unwind stack frames and show us registers for specific calls (you can use kn to show frame number),
 
0:000> .frame /r 3
03 000000dd`9b17e9e0 00007ff8`121c9e33 0x00007ff7`b2685f01
rax=000000000002a020 rbx=000000dd9b17eb08 rcx=0000000000000010
rdx=000000000000000b rsi=000000dd9b17eab8 rdi=000000dd9b17ec68
rip=00007ff7b2685f01 rsp=000000dd9b17e9e0 rbp=000000dd9b17ea00
r8=0000000000000010 r9=000002740000ebb0 r10=0000000000000000
r11=000000dd9b17e8d0 r12=000000dd9b17ecc8 r13=000000dd9b17ebe0
r14=0000000000000000 r15=0000000000000004
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
00007ff7`b2685f01 90 nop
 
In that case RSP register will point to the top of the stack and RBP (base pointer) will point to the start of frame. There is no guarantee in general that RBP will be used, but it seems like that’s the case for .NET. I will add 8 bytes to RBP to illustrate that the previous value on the stack is a return address.
 
0:000> dps 000000dd9b17e9e0 000000dd9b17ea00+8
000000dd`9b17e9e0 00000000`00000000
000000dd`9b17e9e8 000000dd`9b17e988
000000dd`9b17e9f0 00000274`753cb410
000000dd`9b17e9f8 00000274`0000ebb0
000000dd`9b17ea00 000000dd`9b17ea30
000000dd`9b17ea08 00007ff8`121c9e33 coreclr!CallDescrWorkerInternal+0x83
 
It seems now is a good time for a bit of theory. Here is a summary of unmanaged x64 stack usage
 
 
We need to remember that stack memory grows from a bigger to smaller address, so to read a dump you need to start from the last line (be careful as in the picture it's actually from top to bottom which can be confusing). In case of address math (000000dd9b17ea00+8) + means “go closer to bottom/start” and - means “go closer to top/end” of the stack.
 
Before the call of a function, the caller is responsible for passing some parameters via register and allocating memory on top of the stack for stack parameters. The call command pushes the return address and is calling function;  prolog allocates memory on the stack for local variables, registers and parameters. If a parameter is passed through registry, memory for that parameter will be always allocated on stack even if it’s not used. In case of dynamic allocation of memory, a nonvolatile register (in our case RBP) must be used as a pointer to the base of the fixed part of the stack and must be saved in the prolog.
 

IL and x64 asm

 
To understand stack memory structure, we need to analyze function prologue. We will use !u -il <instruction_pointer> to see native code with IL
 
0:000> !U -il 00007FF7B2685F01
Normal JIT generated code
HelloWorld.Program.Main()
ilAddr is 00000274752D2050 pImport is 0000019592B4CA20
IL_0000: nop
IL_0001: ldstr "Hello World!"
IL_0006: call void System.Console::WriteLine(string)
IL_000b: nop
IL_000c: call void System.Diagnostics.Debugger::Break()
IL_0011: nop
IL_0012: ret
Begin 00007FF7B2685ED0, size 39
00007ff7`b2685ed0 55 push rbp
00007ff7`b2685ed1 4883ec20 sub rsp,20h
00007ff7`b2685ed5 488d6c2420 lea rbp,[rsp+20h]
00007ff7`b2685eda 833d8fcb090000 cmp dword ptr [00007ff7`b2722a70],0
00007ff7`b2685ee1 7405 je 00007ff7`b2685ee8
00007ff7`b2685ee3 e89840c95f call coreclr!JIT_DbgIsJustMyCode (00007ff8`12319f80)
IL_0000: nop
00007ff7`b2685ee8 90 nop
IL_0001: ldstr "Hello World!"
00007ff7`b2685ee9 48b9c030001074020000 mov rcx,274100030C0h
00007ff7`b2685ef3 488b09 mov rcx,qword ptr [rcx]
IL_0006: call void System.Console::WriteLine(string)
00007ff7`b2685ef6 e8b5ffffff call 00007ff7`b2685eb0
IL_000b: nop
00007ff7`b2685efb 90 nop
IL_000c: call void System.Diagnostics.Debugger::Break()
00007ff7`b2685efc e8affbffff call 00007ff7`b2685ab0 (System.Diagnostics.Debugger.Break(), mdToken: 0000000006005841)
IL_0011: nop
>>> 00007ff7`b2685f01 90 nop
IL_0012: ret
00007ff7`b2685f02 90 nop
00007ff7`b2685f03 488d6500 lea rsp,[rbp]
00007ff7`b2685f07 5d pop rbp
00007ff7`b2685f08 c3 ret
 
Function is represented by prolog, body and epilog. It’s quite easy to guess that prolog in that case will be first 5 lines (or you can use .fnent command to get prolog size),
 
00007ff7`b2685ed0 55 push rbp
00007ff7`b2685ed1 4883ec20 sub rsp,20h
00007ff7`b2685ed5 488d6c2420 lea rbp,[rsp+20h]
00007ff7`b2685ee1 7405 je 00007ff7`b2685ee8
00007ff7`b2685ee3 e89840c95f call coreclr!JIT_DbgIsJustMyCode (00007ff8`12319f80)
 
At first instruction RSP points to 000000dd`9b17ea00. Push command puts that value to the RBP registry and subtracts 8 bytes from RSP (stack grows down). Then it allocates 32 bytes (0x20) for local variables.
 
0:000> dps 000000dd9b17e9e0 000000dd9b17ea00+8
000000dd`9b17e9e0 00000000`00000000<| - RSP points here as well
000000dd`9b17e9e8 000000dd`9b17e988<|
000000dd`9b17e9f0 00000274`753cb410 <|
000000dd`9b17e9f8 00000274`0000ebb0 <= from there, 32 bytes up for locals and parameters
000000dd`9b17ea00 000000dd`9b17ea30 <= old value of RBP registry
000000dd`9b17ea08 00007ff8`121c9e33 coreclr!CallDescrWorkerInternal+0x83 <= return function address
 
To investigate objects in memory we could use !dumpObject or !do. Let’s try to find "Hello World!" string.
 
0:000> !do 274100030C0
<Note: this object has an invalid CLASS field>

Invalid object

 
That’s not a correct .NET object. After dumping memory at that location we can guess that it is an address of address (pointer on pointer),
 
0:000> dps 00000274100030C0
00000274`100030c0 00000274`0000c5f0
00000274`100030c8 00000274`0000c698
00000274`100030d0 00000274`0000c720
 
0:000> !do 00000274`0000c5f0
Name: System.String
MethodTable: 00007ff7b2707a78
EEClass: 00007ff7b26f5ce0
Size: 46(0x2e) bytes
File: C:\Program Files\dotnet\shared\Microsoft.NETCore.App\5.0.0-rc.2.20475.5\System.Private.CoreLib.dll
String: Hello World!
Fields:
MT Field Offset Type VT Attr Value Name
00007ff7b264b258 4000212 8 System.Int32 1 instance 12 _stringLength
00007ff7b2648070 4000213 c System.Char 1 instance 48 _firstChar
00007ff7b2707a78 4000211 c0 System.String 0 static 0000027400001520 Empty
 

Conclusion

 
It’s definitely not easy to look inside a running application or memory dump. But this can be a useful tool to investigate framework internals and enhance understanding of C#/.NET and Windows development.
 
Commands used,
 
g, .load, k, .frame, dps
!clrstack, !u, !do
Other useful commands:
sxe, .fnent
!help, !heap,!dso