Garbage Collector Algorithm

Anand Thakur
19y
103.4k
0
2

Article

Memory Management

One of the main sources of nasty, difficult- to finds bug on modern application is incorrect use of manual memory management. Like in C++, programmer would create an object and forget to delete it when they finished using it. This seemingly simple paradigm has been one of the major sources of programming errors. After all, how many times have you forgotten to free memory when it is no longer needed or attempted to use memory after you've already freed it? These leaks eventually consumed a process's entire memory space and caused it to crash.

Garbage Collection

Microsoft has made automatic memory management part of the .NET CLR, which allows it to be used from any .NET language. A programmer creates an object using the new operator and receives a reference to it. The CLR allocate that object's memory from managed heap. The Microsoft.NET common language runtime requires that all resources be allocated from the managed heap. An object to which all the references have been disappeared is called garbage and removed from heap. This entire operation is called garbage collection. In managed heap objects are automatically freed when they are no longer needed by the application. This, of course, raises the question: how does the managed heap know when an object is no longer in use by the application?

The Question

How does the garbage collector know if the application is using an object or not? It is not a simple question to answer. If any object exists, which is no longer used by any application, then the memory used by these objects can be reclaimed.

Application Root

Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null. For example, all the global and static object pointers in an application are considered part of the application's roots. In addition, any local variable/parameter object pointers on a thread's stack are considered part of the application's roots. Finally, any CPU registers containing pointers to objects in the managed heap are also considered part of the application's roots. The list of active roots is maintained by the just-in-time (JIT) compiler and common language runtime, and is made accessible to the garbage collector's algorithm.

The Algorithm

GCs only occur when the heap is full. When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. In other words, it assumes that none of the application's roots refer to any objects in the heap. Now, the garbage collector starts walking the roots and building a graph of all objects reachable from the roots. For example, the garbage collector may locate a global variable that points to an object in the heap.

Following Figure shows a heap with several allocated objects where the application roots 1 refer directly to objects Obj1, Obj2 and application root 2 refer to Obj4 and obj5. All of these objects become part of the graph. When adding object Obj2 of application root 1, the collector notices that this object refers to object Obj7 is also added to the graph. The collector continues to walk through all reachable objects recursively.

Garbage collector checks the all application roots and walks the objects again. As the garbage collector walks from object to object, if it attempts to add an object to the graph that it previously added, then the garbage collector can stop walking down that path. This serves two purposes. First, it helps performance significantly since it doesn't walk through a set of objects more than once. Second, it prevents infinite loops should you have any circular linked lists of objects.

Once all the roots have been checked, the garbage collector's graph contains the set of all objects that are somehow reachable from the application's roots; any objects that are not in the graph are not accessible by the application, and are therefore considered garbage. The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space). The garbage collector then shifts the non-garbage objects down in, removing all of the gaps in the heap. Of course, moving the objects in memory invalidates all pointers to the objects. So the garbage collector must modify the application's roots so that the pointers point to the objects' new locations. In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.

After all the garbage has been identified, all the non-garbage has been compacted, and all the non-garbage pointers have been fixed-up, the NextObjPtr is positioned just after the last non-garbage object. At this point, the new operation is tried again and the resource requested by the application is successfully created.

There are a few important things to note at this point. You no longer have to implement any code that manages the lifetime of any resources that your application uses. It is not possible to leak resources, since any resource not accessible from your application's roots can be collected at some point. Second, it is not possible to access a resource that is freed, since the resource won't be freed if it is reachable. If it's not reachable, then your application has no way to access it.

Finalization

The garbage collector offers an additional feature that you may want to take advantage of: finalization. Finalization allows a resource to gracefully clean up after itself when it is being collected. By using finalization, a resource representing a file or network connection is able to clean itself up properly when the garbage collector decides to free the resource's memory.

Conclusion

The motivation for garbage-collected environments is to simplify memory management for the developer. Garbage collection algorithm shows how resources are allocated, how automatic garbage collection works, how to use the finalization feature to allow an object to clean up after itself.