FREE BOOK

Chapter 6: Memory Corruption Part II - Heaps

Posted by Addison Wesley Free Book | C# Language November 16, 2009
This chapter discusses a myriad of stability issues that can surface in an application when the heap is used in a nonconventional fashion. Although the stack and the heap are managed very differently in Windows, the process by which we analyze stack- and heap-related problems is the same.

Heap Reuse After Deletion

Next to heap overruns, heap reuse after deletion is the second most common source of heap corruptions. As you have already seen, after a heap block has been freed, it is put on the free lists (or look aside list) by the heap manager. From there on, it is considered invalid for use by the application. If an application uses the free block in any way, shape, or form, the state of the block on the free list will most likely be corrupted and the application will crash.

Before we take a look at some practical examples of heap reuse after free, let's review the deletion process. Figure 6.12 shows a hypothetical example of a heap segment.



Figure 6.12 Hypothetical example of a heap segment

The segment consists of two busy blocks (B1 and B2) whose user-accessible part is surrounded by their associated metadata. Additionally, the free list contains one free block (Bx) of size 16. If the application frees block B1, the heap manager, first and foremost, checks to see if the block can be coalesced with any adjacent free blocks. Because there are no adjacent free blocks, the heap manager simply updates the status of the block (flags field of the metadata) to free and updates the corresponding free list to include B1. It is critical to note that the free list consists of a forward link (FLINK) and a backward link (BLINK) that each points to the next and previous free block in the list. Are the FLINK and BLINK pointers part of a separately allocated free list node? Not quite-for efficiency reasons, when a block is freed, the structure of the existing free block changes. More specifically, the user-accessible portion of the heap block is overwritten by the heap manager with the FLINK and BLINK pointers, each pointing to the next and previous free block on the free list. In our hypothetical example in Figure 6.12, B1 is inserted at the beginning of the free list corresponding to size 16. The user-accessible portion of B1 is replaced with a FLINK that points to Bx and a BLINK that points to the start of the list (itself). The existing free block Bx is also updated by the BLINK pointing to B1. Figure 6.13 illustrates the resulting layout after freeing block B1.



Figure 6.13 Heap segment and free lists after freeing B1

Next, when the application frees block B2, the heap manager finds an adjacent free block (B1) and coalesces both blocks into one large free block. As part of the coalescing process, the heap manager must remove block B1 from the free list since it no longer exists and add the new larger block to its corresponding free list. The resulting large block's user-accessible part now contains FLINK and BLINK pointers that are updated according to the state of the free list.

So far, we have assumed that all heap blocks freed make their way to the back end allocator's free lists. Although it's true that some free blocks go directly to the free lists, some of the allocations may end up going to the front end allocator's look aside list. Whena heap block goes into the look aside list, the primary differences can be seen in the heap block metadata:

  • Heap blocks that go into the look aside list have their status bit set to busy (in comparison to free in free lists)
  • The look aside list is a singly linked list (in comparison to the free lists doubly linked), and hence only the FLINK pointer is considered valid.

The most important aspect of freeing memory, as related to heap reuse after free, is the fact that the structure of the heap block changes once it is freed. The user-accessible portion of the heap block is now used for internal bookkeeping to keep the free lists upto- date. If the application overwrites any of the content (thinking the block is still busy), the FLINK and BLINK pointers become corrupt, and the structural integrity of the free list is compromised. The net result is most likely a crash somewhere down the road when the heap manager tries to manipulate the free list (usually during another allocate or free call).

Listing 6.10 shows an example of an application that allocates a block of memory and subsequently frees the block twice.

Listing 6.10 Simple example of double free

#include <windows.h>
#include <stdio.h>
#include <conio.h>
int __cdecl wmain (int argc, wchar_t* pArgs[])
{
printf("Press any key to start\n");
_getch();
BYTE* pByte=(BYTE*) HeapAlloc(GetProcessHeap(), 0, 10);
(*pByte)=10;
HeapFree(GetProcessHeap(), 0, pByte);
HeapFree(GetProcessHeap(), 0, pByte);
printf("Done...exiting application\n");
return 0;
}

The source code and binary for Listing 6.9 can be found in the following folders:

Source code: C:\AWD\Chapter6\DblFree
Binary: C:\AWDBIN\WinXP.x86.chk\06DblFree.exe

Running the application yields no errors:

C:\AWDBIN\WinXP.x86.chk\06DblFree.exe

To make sure that nothing out of the ordinary is happening, let's start the application under the debugger and make our way to the first heap allocation.
...
...
...
0:001> u wmain
06dblfree!wmain:
01001180 55 push ebp
01001181 8bec mov ebp,esp
01001183 51 push ecx
01001184 68a8100001 push offset 06dblfree!`string' (010010a8)
01001189 ff1548100001 call dword ptr [06dblfree!_imp__printf (01001048)]
0100118f 83c404 add esp,4
01001192 ff1550100001 call dword ptr [06dblfree!_imp___getch (01001050)]
01001198 6a0a push 0Ah
0:001> u
06dblfree!wmain+0x1a:
0100119a 6a00 push 0
0100119c ff1508100001 call dword ptr [06dblfree!_imp__GetProcessHeap
(01001008)]
010011a2 50 push eax
010011a3 ff1500100001 call dword ptr [06dblfree!_imp__HeapAlloc (01001000)]
010011a9 8945fc mov dword ptr [ebp-4],eax
010011ac 8b45fc mov eax,dword ptr [ebp-4]
010011af c6000a mov byte ptr [eax],0Ah
010011b2 8b4dfc mov ecx,dword ptr [ebp-4]
0:001> g 010011a9
eax=000830c0 ebx=7ffde000 ecx=7c9106eb edx=00080608 esi=01c7078e edi=83485b7a
eip=010011a9 esp=0006ff40 ebp=0006ff44 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
06dblfree!wmain+0x29:
010011a9 8945fc mov dword ptr [ebp-4],eax
ss:0023:0006ff40={msvcrt!__winitenv (77c61a40)}
Register eax now contains the pointer to the newly allocated block of memory:
0:000> dt _HEAP_ENTRY 000830c0-0x8
+0x000 Size : 3
+0x002 PreviousSize : 3
+0x000 SubSegmentCode : 0x00030003
+0x004 SmallTagIndex : 0x21 '!'
+0x005 Flags : 0x1 "
+0x006 UnusedBytes : 0xe "
+0x007 SegmentIndex : 0 "

Nothing seems to be out of the ordinary-the size fields all seem reasonable, and the flags field indicates that the block is busy. Now, continue execution past the first call to HeapFree and dump out the same heap block.

0:000> dt _HEAP_ENTRY 000830c0-0x8
+0x000 Size : 3
+0x002 PreviousSize : 3
+0x000 SubSegmentCode : 0x00030003
+0x004 SmallTagIndex : 0x21 '!'
+0x005 Flags : 0x1 "
+0x006 UnusedBytes : 0xe "
+0x007 SegmentIndex : 0 "

Even after freeing the block, the metadata looks identical. The flags field even has its busy bit still set, indicating that the block is not freed. The key here is to remember that when a heap block is freed, it can go to one of two places: look aside list or free lists. When a heap block goes on the look aside list, the heap block status is kept as busy. On the free lists, however, the status is set to free.

In our particular free operation, the block seems to have gone on the look aside list. When a block goes onto the look aside list, the first part of the user-accessible portion of the block gets overwritten with the FLINK pointer that points to the next available block on the look aside list. The user-accessible portion of our block resembles

0:000> dd 000830c0
000830c0 00000000 00080178 00000000 00000000
000830d0 000301e6 00001000 00080178 00080178
000830e0 00000000 00000000 00000000 00000000
000830f0 00000000 00000000 00000000 00000000
00083100 00000000 00000000 00000000 00000000
00083110 00000000 00000000 00000000 00000000
00083120 00000000 00000000 00000000 00000000
00083130 00000000 00000000 00000000 00000000

As you can see, the FLINK pointer in our case is NULL, which means that this is the first free heap block. Next, continue execution until right after the second call to HeapFree (of the same block). Once again, we take a look at the state of the heap block:

0:000> dt _HEAP_ENTRY 000830c0-0x8
+0x000 Size : 3
+0x002 PreviousSize : 3
+0x000 SubSegmentCode : 0x00030003
+0x004 SmallTagIndex : 0x21 '!'
+0x005 Flags : 0x1 "
+0x006 UnusedBytes : 0xe "
+0x007 SegmentIndex : 0 "

Nothing in the metadata seems to have changed. Block is still busy, and the size fields seem to be unchanged. Let's dump out the user-accessible portion and take a look at the FLINK pointer:

0:000> dd 000830c0
000830c0 000830c0 00080178 00000000 00000000
000830d0 000301e6 00001000 00080178 00080178
000830e0 00000000 00000000 00000000 00000000
000830f0 00000000 00000000 00000000 00000000
00083100 00000000 00000000 00000000 00000000
00083110 00000000 00000000 00000000 00000000
00083120 00000000 00000000 00000000 00000000
00083130 00000000 00000000 00000000 00000000

This time, FLINK points to another free heap block, with the user-accessible portion starting at location 000830c0. The block corresponding to location 000830c0 is the same block that we freed the first time. By double freeing, we have essentially managed to put the look aside list into a circular reference. The consequence of doing so can cause the heap manager to go into an infinite loop when subsequent heap operations force the heap manager to walk the free list with the circular reference. At this point, if we resume execution, we notice that the application finishes execution. Why did it finish without failing in the heap code? For the look aside list circular reference to be exposed, another call has to be made to the heap manager that would cause it to walk the list and hit the circular link. Our application was finished after the second HeapFree call, and the heap manager never got a chance to fail. Even though the failure did not surface in the few runs we did, it is still a heap corruption, and it should be fixed. Corruption of a heap block on the look aside list (or the free lists) can cause serious problems for an application. Much like the previous types of heap corruptions, double freeing problems typically surface in the form of post corruption crashes when the heap manager needs to walk the look aside list (or free list). Is there a way to use Application Verifier in this case, as well to trap the problem as it is occurring? The same heaps test setting used throughout the chapter also makes a best attempt at catching double free problems. By tagging the heap blocks in a specific way, Application Verifier is able to catch double freeing problems as they occur and break execution, allowing the developer to take a closer look at the code that is trying to free the block the second time. Let's enable full pageheap on our application and rerun it under the debugger. Right away, you will see a first chance access violation occur with the following stack trace:

0:000> kb
ChildEBP RetAddr Args to Child
0007fcc4 7c96ac47 00091000 005e4ff0 0007fcf8 ntdll!RtlpDphIsNormalHeapBlock+0x1c
0007fce8 7c96ae5a 00091000 01000002 00000000 ntdll!RtlpDphNormalHeapFree+0x1e
0007fd38 7c96defb 00090000 01000002 005e4ff0 ntdll!RtlpDebugPageHeapFree+0x79
0007fdac 7c94a5d0 00090000 01000002 005e4ff0 ntdll!RtlDebugFreeHeap+0x2c
0007fe94 7c9268ad 00090000 01000002 005e4ff0 ntdll!RtlFreeHeapSlowly+0x37
0007ff64 0100128a 00090000 00000000 005e4ff0 ntdll!RtlFreeHeap+0xf9
0007ff7c 01001406 00000001 0070cfd8 0079ef68 06DblFree!wmain+0x5a
0007ffc0 7c816fd7 00011970 7c9118f1 7ffd7000 06DblFree!__wmainCRTStartup+0x102
0007fff0 00000000 01001544 00000000 78746341 kernel32!BaseProcessStart+0x23

Judging from the stack, we can see that our wmain function is making its second call to HeapFree, which ends up access violating deep down in the heap manager code. Anytime you have this test setting turned on and experience a crash during a HeapFree call, the first thing you should check is whether a heap block is being freed twice. Because a heap block can go on the look aside list when freed (its state might still be set to busy even though it's considered free from a heap manager's perspective), the best way to figure out if it's really free is to use the !heap -p -a<heap block> command. Remember that this command dumps out detailed information about a page heap block, including the stack trace of the allocating or freeing code. Find the address of the heap block that we are freeing twice (as per preceding stack trace), and run the !heap extension command on it:

0:000> !heap -p -a 005d4ff0
address 005d4ff0 found in
_DPH_HEAP_ROOT @ 81000
in free-ed allocation ( DPH_HEAP_BLOCK: VirtAddr VirtSize)
8430c: 5d4000 2000
7c9268ad ntdll!RtlFreeHeap+0x000000f9
010011c5 06dblfree!wmain+0x00000045
0100131b 06dblfree!wmainCRTStartup+0x0000012f
7c816fd7 kernel32!BaseProcessStart+0x00000023

As you can see from the output, the heap block status is free. Additionally, the stack shows us the last operation performed on the heap block, which is the first free call made. The stack trace shown corresponds nicely to our first call to HeapFree in the wmain function. If we resume execution of the application, we notice several other first-chance access violations until we finally get an Application Verifier stop:

0:000> g
(1d4.6d4): Access violation - code c0000005 (first chance)

First chance exceptions are reported before any exception handling.

This exception may be expected and handled.

eax=0006fc7c ebx=00081000 ecx=00000008 edx=00000000 esi=005d4fd0 edi=0006fc4c
eip=7c969a1d esp=0006fc40 ebp=0006fc8c iopl=0 nv up ei pl nz na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010203
ntdll!RtlpDphReportCorruptedBlock+0x25:
7c969a1d f3a5 rep movs dword ptr es:[edi],dword ptr [esi]
es:0023:0006fc4c=00000000 ds:0023:005d4fd0=????????
0:000> g
(1d4.6d4): Access violation - code c0000005 (first chance)

First chance exceptions are reported before any exception handling.

This exception may be expected and handled.

eax=0006fc20 ebx=00000000 ecx=005d4ff0 edx=00000000 esi=00000000 edi=00000000
eip=7c968a84 esp=0006fc08 ebp=0006fc30 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
ntdll!RtlpDphGetBlockSizeFromCorruptedBlock+0x13:
7c968a84 8b41e0 mov eax,dword ptr [ecx-20h] ds:0023:005d4fd0=????????
0:000> g
=======================================
VERIFIER STOP 00000008 : pid 0x1D4: Corrupted heap block.
00081000 : Heap handle used in the call.
005D4FF0 : Heap block involved in the operation.
00000000 : Size of the heap block.
00000000 : Reserved
=======================================
This verifier stop is not continuable. Process will be terminated
when you use the `go' debugger command.
=======================================
(1d4.6d4): Break instruction exception - code 80000003 (first chance)
eax=000001ff ebx=0040acac ecx=7c91eb05 edx=0006f959 esi=00000000 edi=000001ff
eip=7c901230 esp=0006f9ec ebp=0006fbec iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!DbgBreakPoint:
7c901230 cc int 3

The last-chance Application Verifier stop shown gives some basic information about the corrupted heap block. If you resume execution at this point, the application will simply terminate because this is a nonrecoverable stop.

This concludes our discussion of the problems associated with double freeing memory. As you have seen, the best tool for catching double freeing problems is to use the heaps test setting (full pageheap) available in Application Verifier. Not only does it report the problem at hand, but it also manages to break execution at the point where the problem really occurred rather than at a post corruption stage, making it much easier to figure out why the heap block was being corrupted. Using full pageheap gives you the strongest possible protection level available for memory-related
problems in general. The means by which full pageheap is capable of giving you this protection is by separating the heap block metadata from the heap block itself. In a nonfull pageheap scenario, the metadata associated with a heap block is part of the heap block itself. If an application is off by a few bytes, it can very easily overwrite the metadata, corrupting the heap block and making it difficult for the heap manager to immediately report the problem. In contrast, using full pageheap, the metadata is kept in a secondary data structure with a one-way link to the real heap block. By using a one-way link, it is nearly impossible for faulty code to corrupt the heap block metadata, and, as such, full pageheap can almost always be trusted to contain intact information. The separation of metadata from the actual heap block is what gives full pageheap the capability to provide strong heap corruption detection.
 

Total Pages : 11 7891011

comments