FREE BOOK

Chapter 6: Memory Corruption Part II - Heaps

Posted by Addison Wesley Free Book | C# Language November 16, 2009
This chapter discusses a myriad of stability issues that can surface in an application when the heap is used in a nonconventional fashion. Although the stack and the heap are managed very differently in Windows, the process by which we analyze stack- and heap-related problems is the same.

The primary difference between a regular heap block and a normal page heap block is the addition of pageheap metadata. The pageheap metadata contains information, such as the block requested and actual sizes, but perhaps the most useful member of the metadata is the stack trace. The stack trace member allows the developer to get the full stack trace of the origins of the allocation (that is, where it was allocated). This aids greatly when looking at a corrupt heap block, as it gives you clues to who the owner of the heap block is and affords you the luxury of narrowing down the scope of the code review. Imagine that the HeapAlloc call in Listing 6.6 resulted in the following pointer: 0019e260. To dump out the contents of the pageheap metadata, we must first subtract 32 (0x20) bytes from the pointer.

0:000> dd 0019e4b8-0x20
0019e498 abcdaaaa 80081000 00000014 0000003c
0019e4a8 00000018 00000000 0028697c dcbaaaaa
0019e4b8 e0e0e0e0 e0e0e0e0 e0e0e0e0 e0e0e0e0
0019e4c8 e0e0e0e0 a0a0a0a0 a0a0a0a0 00000000
0019e4d8 00000000 00000000 000a0164 00001000
0019e4e8 00180178 00180178 00000000 00000000
0019e4f8 00000000 00000000 00000000 00000000
0019e508 00000000 00000000 00000000 00000000

Here, we can clearly see the starting (abcdaaaa) and ending (dcbaaaaa) fill patterns that enclose the metadata. To see the pageheap metadata in a more digestible form, we can use the _DPH_BLOCK_INFORMATION data type:

0:000> dt _DPH_BLOCK_INFORMATION 0019e4b8-0x20
+0x000 StartStamp :
+0x004 Heap : 0x80081000
+0x008 RequestedSize :
+0x00c ActualSize :
+0x010 FreeQueue : _LIST_ENTRY 18-0
+0x010 TraceIndex : 0x18
+0x018 StackTrace : 0x0028697c
+0x01c EndStamp :

The stack trace member contains the stack trace of the allocation. To see the stack trace, we have to use the dds command, which displays the contents of a range of memory under the assumption that the contents in the range are a series of addresses in the symbol table.

0:000> dds 0x0028697c
0028697c abcdaaaa
00286980 00000001
00286984 00000006
...
...
...
0028699c 7c949d18 ntdll!RtlAllocateHeapSlowly+0x44
002869a0 7c91b298 ntdll!RtlAllocateHeap+0xe64
002869a4 01001224 06overrun!DupString+0x24
002869a8 010011eb 06overrun!wmain+0x2b
002869ac 010013a9 06overrun!wmainCRTStartup+0x12b
002869b0 7c816d4f kernel32!BaseProcessStart+0x23
002869b4 00000000
002869b8 00000000
...
...
...
The shortened version of the output of the dds command shows us the stack trace of the allocating code. I cannot stress the usefulness of the recorded stack trace database enough. Whether you are looking at heap corruptions or memory leaks, given any pageheap block, you can very easily get to the stack trace of the allocating code, which in turn allows you to focus your efforts on that area of the code.

Now let's see how the normal pageheap facility can be used to track down the memory corruption shown earlier in Listing 6.6. Enable normal pageheap on the application (see Appendix A, "Application Verifier Test Settings"), and start the process under the debugger using ThisStringShouldReproTheCrash as input. Listing 6.8 shows how Application Verifier breaks execution because of a corrupted heap block.

Listing 6.8 Application verifier reported heap block corruption
...
...
...
0:000> g
Press any key to start
Copy of string: ThisStringShouldReproTheCrash
=======================================
VERIFIER STOP 00000008 : pid 0x640: Corrupted heap block.
00081000 : Heap handle used in the call.
001A04D0 : Heap block involved in the operation.
00000014 : Size of the heap block.
00000000 : Reserved
=======================================
This verifier stop is not continuable. Process will be terminated
when you use the `go' debugger command.
=======================================
(640.6a8): Break instruction exception - code 80000003 (first chance)
eax=000001ff ebx=0040acac ecx=7c91eb05 edx=0006f949 esi=00000000 edi=000001ff
eip=7c901230 esp=0006f9dc ebp=0006fbdc iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!DbgBreakPoint:
7c901230 cc int 3

The information presented by Application Verifier gives us the pointer to the heap block that was corrupted. From here, getting the stack trace of the allocating code is trivial.

0:000> dt _DPH_BLOCK_INFORMATION 001A04D0-0x20
+0x000 StartStamp : 0xabcdaaaa
+0x004 Heap : 0x80081000
+0x008 RequestedSize : 0x14
+0x00c ActualSize : 0x3c
+0x010 FreeQueue : _LIST_ENTRY [ 0x18 - 0x0 ]
+0x010 TraceIndex : 0x18
+0x018 StackTrace : 0x0028697c
+0x01c EndStamp : 0xdcbaaaaa
0:000> dds 0x0028697c
0028697c abcdaaaa
00286980 00000001
00286984 00000006
00286988 00000001
0028698c 00000014
00286990 00081000
00286994 00000000
00286998 0028699c
0028699c 7c949d18 ntdll!RtlAllocateHeapSlowly+0x44
002869a0 7c91b298 ntdll!RtlAllocateHeap+0xe64
002869a4 01001202 06overrun!DupString+0x22
002869a8 010011c1 06overrun!wmain+0x31
002869ac 0100138d 06overrun!wmainCRTStartup+0x12f
002869b0 7c816fd7 kernel32!BaseProcessStart+0x23
...
...
...
Knowing the stack trace allows us to efficiently find the culprit by narrowing down the scope of the code review.

If you compare and contrast the non-Application Verifier-enabled approach of finding out why a process has crashed with the Application Verifier-enabled approach, you will quickly see how much more efficient it is. By using normal pageheap, all the information regarding the corrupted block is given to us, and we can use that to analyze the heap block and get the stack trace of the allocating code. Although normal pageheap breaks execution and gives us all this useful information, it still does so only after a corruption has occurred, and it still requires us to do some backtracking to figure out why it happened. Is there a mechanism to break execution even closer to the corruption? Absolutely! Normal pageheap is only one of the two modes of pageheap that can be enabled. The other mode is known as full pageheap. In addition to its own unique fill patterns, full pageheap adds the notion of a guard page to each heap block. A guard page is a page of inaccessible memory that is placed either at the start or at the end of a heap block. Placing the guard page at the start of the heap block protects against heap block underruns, and placing it at the end protects against heap overruns. Figure 6.11 illustrates the layout of a full pageheap block.



Figure 6.11 Full page heap block layout

The inaccessible page is added to protect against heap block overruns or underruns. If a faulty piece of code writes to the inaccessible page, it causes an access violation, and execution breaks on the spot. This allows us to avoid any type of backtracking strategy to figure out the origins of the corruption.

Now we can once again run our sample application, this time with full pageheap enabled (see Appendix A), and see where the debugger breaks execution.
...
...
...
0:000> g
Press any key to start
(414.494): Access violation - code c0000005 (first chance)

First chance exceptions are reported before any exception handling.

This exception may be expected and handled.

eax=006f006f ebx=7ffd7000 ecx=005d5000 edx=006fefd8 esi=7c9118f1 edi=00011970
eip=77c47ea2 esp=0006ff20 ebp=0006ff20 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
msvcrt!wcscpy+0xe:
77c47ea2 668901 mov word ptr [ecx],ax ds:0023:005d5000=????
0:000> kb
ChildEBP RetAddr Args to Child
0006ff20 01001221 005d4fe8 006fefc0 00000000 msvcrt!wcscpy+0xe
0006ff34 010011c1 006fefc0 00000000 0006ffc0 06overrun!DupString+0x41
0006ff44 0100138d 00000002 006fef98 00774f88 06overrun!wmain+0x31
0006ffc0 7c816fd7 00011970 7c9118f1 7ffd7000 06overrun!wmainCRTStartup+0x12f
0006fff0 00000000 0100125e 00000000 78746341 kernel32!BaseProcessStart+0x23

This time, an access violation is recorded during the string copy call. If we take a closer
look at the heap block at the point of the access violation, we see

0:000> dd 005d4fe8
005d4fe8 00680054 00730069 00740053 00690072
005d4ff8 0067006e 00680053 ???????? ????????
005d5008 ???????? ???????? ???????? ????????
005d5018 ???????? ???????? ???????? ????????
005d5028 ???????? ???????? ???????? ????????
005d5038 ???????? ???????? ???????? ????????
005d5048 ???????? ???????? ???????? ????????
005d5058 ???????? ???????? ???????? ????????
0:000> du 005d4fe8
005d4fe8 "ThisStringSh????????????????????"
005d5028 "????????????????????????????????"
005d5068 "????????????????????????????????"
005d50a8 "????????????????????????????????"
005d50e8 "????????????????????????????????"
005d5128 "????????????????????????????????"
005d5168 "????????????????????????????????"
005d51a8 "????????????????????????????????"
005d51e8 "????????????????????????????????"
005d5228 "????????????????????????????????"
005d5268 "????????????????????????????????"
005d52a8 "????????????????????????????????"

We can make two important observations about the dumps:

  • The string we are copying has overwritten the suffix fill pattern of the block, as well as the heap entry.
  • At the point of the access violation, the string copied so far is ThisStringSh, which indicates that the string copy function is not yet done and is about to write to the inaccessible page placed at the end of the heap block by Application Verifier.

By enabling full pageheap, we were able to break execution when the corruption occurred rather than after. This can be a huge time-saver, as you have the offending code right in front of you when the corruption occurs, and finding out why the corruption occurred just got a lot easier. One of the questions that might be going through your mind is, "Why not always run with full pageheap enabled?" Well, full pageheap is very resource intensive. Remember that full pageheap places one page of inaccessible memory at the end (or beginning) of each allocation. If the process you are debugging is memory hungry, the usage of pageheap might increase the overall memory consumption by an order of magnitude.

In addition to heap block overruns, we can experience the reciprocal: heap underruns. Although not as common, heap underruns overwrite the part of the heap block prior to the user-accessible part. This can be because of bad pointer arithmetic causing a premature write to the heap block. Because normal pageheap protects the pageheap metadata by using fill patterns, it can trap heap underrun scenarios as well. Full pageheap, by default, places a guard page at the end of the heap block and will not break on heap underruns. Fortunately, using the backward overrun option of full pageheap (see Appendix A), we can tell it to place a guard page at the front of the allocation rather than at the end and trap the underrun class of problems as well. The !heap extension command previously used to analyze heap state can also be used when the process is running under pageheap. By using the-p flag, we can tell the !heap extension command that the heap in question is pageheap enabled. The options available for the -p flag are

heap -p Dump all page heaps.
heap -p -h ADDR Detailed dump of page heap at ADDR.
heap -p -a ADDR Figure out what heap block is at ADDR.
heap -p -t [N] Dump N collected traces with heavy heap users.
heap -p -tc [N] Dump N traces sorted by count usage (eqv. with -t).
heap -p -ts [N] Dump N traces sorted by size.
heap -p -fi [N] Dump last N fault injection traces.

For example, the heap block returned from the HeapAlloc call in our sample application
resembles the following when used with the -p and -a flags:

0:000> !heap -p -a 005d4fe8
address 005d4fe8 found in
_DPH_HEAP_ROOT @ 81000
in busy allocation ( DPH_HEAP_BLOCK: UserAddr UserSize -
VirtAddr VirtSize)
8430c: 5d4fe8 14 -
5d4000 2000
7c91b298 ntdll!RtlAllocateHeap+0x00000e64
01001202 06overrun!DupString+0x00000022
010011c1 06overrun!wmain+0x00000031
0100138d 06overrun!wmainCRTStartup+0x0000012f
7c816fd7 kernel32!BaseProcessStart+0x00000023

The output shows us the recorded stack trace as well as other auxiliary information, such as which fill pattern is in use. The fill patterns can give us clues to the status of the heap block (allocated or freed). Another useful switch is the -t switch. The -t switch allows us to dump out part of the stack trace database to get more information about all the stacks that have allocated memory. If you are debugging a process that is using up a ton of memory and want to know which part of the process is responsible for the biggest allocations, the heap -p -t command can be used.

Total Pages : 11 7891011

comments