Garbage Collection In Depth

Jignesh Trivedi
10y
48.6k
0
7

Article

Introduction

A Garbage Collector is an automatic memory manager. Garbage collection is a process of releasing the memory used by the objects that are no longer referenced. It has the following advantages.

It allows us to develop an application without having to free memory.
It efficiently allocates an object on the managed heap.
It reclaims object memories that are no longer being used.
It provides memory safety by making sure that an object cannot use the content of another object.

The CLR initializes the garbage collector, then the CLR allocates a segment of memory to store and manage object(s). This is called the managed heap. The managed heap is for each managed process, in other words all threads in the process allocate memory for objects in the same heap memory. The two Win32 functions VirtualAlloc and VirtualFree are used by the garbage collector for allocating a segment of memory and releasing segments back to the operating system respectively. The garbage collector does less work if there are fewer objects allocated on the heap. When garbage collection is triggered, the garbage collector reclaims the memory that is used by dead objects (dead objects are objects that are no longer referenced).

Heap Generations

The heap is organized into generations, so it can handle long-lived objects and short-lived objects. The primarily garbage collector reclaims short-lived objects (that occupy only a small part of the heap). There are the following three generations of objects on the heap.

Generation 0

It contains short-lived objects like temporary variables. The garbage collection occurs most frequently in this generation.
Generation 1

It is also contains short-lived objects and works as a buffer between short lived objects and long-lived objects.
Generation 2

It contains long-lived objects like static data and large arrays.

Initially objects are created during Generation 0. The object is promoted to Generation 1 if they are alive when collection occurrs on Generation 0. The same objects are promoted in Generation 2 if they are alive when the collection occurrs on Generation 1.

Most of the objects (like local variables and parameters) are very temporary and go out of scope in a very short time, so we can collect these objects without having to go through to the other memory (Generation 1 and Generation 2), this is an advantage of this type of generational GC.

When a new object is allocated, it will first be allocated right after the last object on the heap in Generation 0 and will grow until Generation 0 has reached a point when garbage collection occurs. The cost of memory allocation is very small when objects are allocated in Generation 0. Generation 1 and 0 live in the same segment called the ephemeral segment and the size of them can never exceed the size of the segment. On the other hand Generation 2 can grow indefinitely until out of memory, so the object consumes a large amount of memory, it was created and lived in Generation 2.

When Garbage collection occurs

The system has low physical memory.
The GC. Collect () method is called by the program. Generally, we do not call this method because the garbage collector runs continuously.
The memory used by the object in the managed heap is exceeded to the acceptable threshold.

The collections don't happen at certain time intervals, in other words if our application is not allocating any memory or is not calling the GC.Collect () method, no collection will occur. One more important point is that the collection of the higher generations will occur only when they reach budgeted memory.

We must properly clean up the native resources like threads, connections and so on. If we are not properly cleaning up these native resources then objects will not be collected by the garbage collector even if they run out of scope.

Garbage collection goes through the following sequence:

Suspends all threads that are making .Net calls.
Collecting the information about a live object (accessible in the current process) from JIT, EE stackwalker, handle table and finalize queue. It marks all other inaccessible objects as garbage.
Delete all marked objects.
Compact the heap by moving unused objects to the backend of the heap. That is the most expensive part.
Resumes all threads.

Garbage Collection Algorithm

The garbage collector checks whether any objects in the heap memory are no longer being used by the application. If any such object exists, the memory used by this object can be reclaimed when no memory is available in the heap or the new operator throws an OutOfMemoryException.

Every application has a set of roots. Roots are nothing but storage locations in the managed heap. For example, all the static and global object pointers are to be considered to be the application’s roots. Additionally, local variables and parameters and any objects in the managed heap with CPU registers containing pointers to them are also considered to be the application's roots. The Just-In-Time (JIT) compiler and Common Language Runtime (CLR) has a list of active roots made accessible to the garbage collector's algorithm. When the garbage collector starts to collect, it assumes that none of the roots refer to any object in the heap. Now the garbage collector walks though the roots and creates a graph of all the objects reachable from the roots.

The following figure shows the heap with several allocated objects. Here the application roots 3, 4 and 6 refer directly and object 5 refers by 3 so the application roots 3, 4, 5 and 6 become part of the graph and the collector continuously walks through all the reachable objects recursively.

Now the group contains all the objects reachable from the application roots and any object that is not part of this graph (in other words it is not accessible by any application) are considered as garbage.

Now the garbage collector walks through the heap linearly and marks the nearby block of garbage object (it is considered free space) and shifts the non-garbage objects down in memory (using the memcpy function), removes all the gaps from the heap and updates all live object's pointers (application root pointer) to the new location. The garbage collector is responsible for correcting the application's root pointer. The garbage collector also sets the next object pointer to last non-garbage object.

The garbage collector generates a significant performance hit and this is a major drawback of using a managed heap. But the GC only occurs when the heap is full, so that the managed heap is significantly faster than a C Runtime heap.

Finalization

Finalization is an additional feature offered by the garbage collector that allows cleaning up resources gracefully when collected. The object is able to clean up itself properly when the garbage collector calls to free the resource’s memory; this is possible with object finalization. If the object has a finalize method then the garbage collector call its finalize method to reclaim object memory.

Object finalization differs from object destructors. The garbage collector keeps track of the objects that have Finalize methods. When any object is created that has a Finalize method, the garbage collector makes an entry in the finalization queue. The finalization queue contains entries for all the objects that need to call their finalization code before the garbage collector can reclaim their memory. The Destructor feature is not directly supported by the language C#, it is a predecessor of C++ on which this language evolved.

Public class BaseObject {
public BaseObject() {
}
protected override void Finalize() {
// Resource Cleanup Code
base.Finalize(); // Call base type's Finalize
}
}

Finalization seems tobe a very straightforward process. When the application creates an object using the new operator, it allocates memory from the heap. If the object has a finalized method then the pointer of the object is placed on the finalization queue. The finalization queue is an internal data structure and it is controlled by the garbage collector.

The following figure shows a heap containing several objects, some of them are reachable from the application’s roots and some are not. Objects 3, 4 and 5 have a “finalize” method, so the system will add them to the finalization queue when they are created. Here object 1 and 2 are garbage.

Now when GC is occurring, objects 3 and 4 are determined as garbage. The garbage collector scans the finalization queue and finds a pointer for these objects (3 and 4). If the pointer is found, the pointer is removed from the finalization queue and is moved to another queue called the freachable queue (The freachable queue is another internal data structure controlled by the garbage collector).

Here object 1 and 2 are directly removed from the memory because these objects do not have a finalize method. The memory occupied by objects 3 and 4 is not reclaimed because they have a finalized method and it's not called yet.

There is a dedicated runtime thread to call finalize methods. When the freachable queue is empty, this thread is in sleep mode. But when entries exist in the freachable queue, this thread wakes up and removes all entries from this queue and calls the finalize method for each object.

Why is this queue name the reachable queue? Here the "F" stands for "finalization". Every entry in this queue has a “finalize” method and should have this method called. The "reachable" part of the name means that the objects are reachable. If the object is in the reachable queue then the object is reachable and it is not garbage.

Summary

When the object is not reachable, the garbage collector considers it as garbage. When the garbage collector moves the object from the finalization queue to the freachable queue, the object is not considered as garbage and hence its memory is not reclaimed. The garbage collector rearranges the reclaimed memory and the dedicated runtime thread emptys the freachable queue and calls the object's finalize method.