.NET Assembly Internals: Part 1

This tutorial drills down into the details of how the CLR resolves the location of externally referenced assemblies.

Abstract

In this wonderful article series, we'll examine the core details of creating, deploying and configuring .NET assemblies and its advantages over existing COM technology. This article goes deeper in terms of understanding the role and format of .NET assemblies and modules. You'll explore assembly manifests and how exactly the .NET runtime resolves the location of the assembly and you'll also get an understanding of the assembly CIL code. This article'll also state the distinction between single-file and multi-file assemblies.

Problem with COM

Microsoft itself introduced the phrase “DLL Hell” to describe the traditional problem with existing COM DLLs. Often an old DLL is replaced by a new version and breaks an application because the newly installed application overwrites a DLL that has also been used by another application. In fact, such problem occurs due to not checking the versions of the DLL properly by the installation program while the new DLL should be backward compatible with the old version to continue the existing functionality. There is no of side-by-side DLL installation feature provided by the existing COM technology. A DLL incorporates various functionality features and it is also referenced from other locations but such functionality is terminated when the old version is replaced by a new version with new functionality.

You can install two version of a single assembly in the side-by-side installation feature. Although this can be applied with COM DLLs but a problem exists in such a case. Literally, COM DLLs are not self-describing. The configuration of a COM component is stored in the registry, not in the Component DLL itself. So the configuration information is taken from the last version rather than two version of a single DLL simultaneously.

Understanding Assembly

The .NET Framework overcomes the DLL Hell or version issues with existing COM technology by introducing assemblies. Assemblies are a self-describing installation unit, consisting of single or multiple files. Virtually, every file that is developed and executed under the .NET Common Language Runtime (CLR) is called an assembly. One assembly file contains metadata and could be an .EXE, DLL or Resource file. Now, let's discuss some of the comprehensive benefits provided by the assembly.

  1. Assemblies can be deployed as private or shared. Private assemblies reside in the same solution directory. Shared assemblies, on the other hand, are libraries intended to be consumed by numerous applications on a single machine because they are deployed to a central repository called the GAC.

  2. The .NET assemblies are assigned a special 4-digit number to concurrently run the multiple versions of an assembly. The 4-digit special number can be specified as “<major>.<minor>.<build>.<revision>”.

  3. Assembly archives every external assembly reference they must have access to in order to function properly. However, assemblies are self-describing by documenting all the external references in the manifest. The comprehensive details of assemblies such as member function, variable name, base class, interface and constructors are placed in the metadata so that the CLR does not need to consult the Windows system registry to resolve its location.

  4. The .NET Framework offers you to reuse types in a language-independent manner so it does not matter how a code library is packaged.

  5. Application isolation is ensured using application domains. A number of applications can run independently inside a single process with an application domain.

  6. Installation of an assembly can be as simple as copying all of its files. Unlike COM, there is no need to register them in the Windows system registry.

Modules

Before delving into assembly types in details, let's discuss the modules. An assembly is typically composed of multiple modules. A module is a DLL without assembly attributes. To get a better understanding, we are creating a C# class library project as in the following.

  1. public class test  
  2. {  
  3.     public test() { }  
  4.     public test(string fname, string lname)  
  5.     {  
  6.         this.FName = fname;  
  7.         this.LName = lname;  
  8.     }  
  9.     public string FName  
  10.     {  
  11.         get;  
  12.         set;  
  13.     }  
  14.     public string LName  
  15.     {  
  16.         get;  
  17.         set;  
  18.     }  
  19.     public override string ToString()  
  20.     {  
  21.         return FName + " " + LName;  
  22.     }  
  23. }  
A module can be created by csc.exe with the "/module" switch. The following command creates a modules test.netmodule as in the following:

csc /target:module test.cs

A module also has a manifest, but there is no ".assembly" entry inside the manifest because a module has no assembly attribute. We can view a module manifest using the ildasm utility as in the following:

metadata

The main objective behind modules is that they can be used for faster startup of assemblies because not all types are inside a single file. The modules are loaded when needed. The second reason is, if you want to create an assembly with more than one programming language then one module could be in VB.NET and another in F#.NET. Finally, these two modules could be included in a single file.

Single file and Multifile Assembly

Technically speaking, an assembly can be formed from a single file and multiple files. A single-file assembly contains all the necessary elements such as CIL code, header files, manifest in a single *.exe or *.dll package.

single file assembly

A multifile assembly, on the other hand, is a set of .NET modules that are deployed and versioned as a single unit. Formally speaking, these modules are called primary and secondary modules. The primary module contains an assembly-level manifest and secondary modules have a *.netmodule extension containing a module-level manifest. The major benefit of a multifile assembly is that they provide a very efficient way to download content.

multifile assembly

Assembly Structure

An assembly is comprised of assembly metadata describing the complete assembly, type metadata unfolding the exported type and methods, MSIL code and resources. All these fragments can be inside one file or spread across several files. Structurally speaking, an assembly is composed of the following elements:

Assembly Structure

CIL code

The CIL code is a CPU and platform-agnostic intermediate language. It can be considered to be the core back-bone of an assembly. Given this design, the .NET assemblies can indeed execute on a variety of devices, architectures and operating systems. At the runtime, the internal CIL is compiled using the Just-In-Time (JIT) compiler for the platform and CPU specific instructions.

CIL code

Understanding the grammar of CIL code can be helpful when you are building complex application but unfortunately most .NET developers are not deeply concerned with the details of CIL code.

Windows File Header

The Windows file header determines how the Windows family of operating systems can load and manipulate an assembly. The headers also identify the kind of application, such as DLL, console or GUI application, to be hosted by Windows. You can view the assembly header information using the dumpbin.exe utility as in the following:

Dumpbin /headers *.dll/*.exe

Dumpbin exe

CLR File Header

The CLR header is a block of data that all .NET assemblies must support in order to be hosted by the CLR. It typically defines numerous flags that enable the runtime to understand the layout of the managed code. We can view such diverse flags, again by using the dumpbin.exe /clrheader flag as in the following:

CLR File Header

Metadata

The .NET runtimes practice metadata to resolve the location of types within the binary. An assembly metadata comprehensively describes the format of the contained types, as well as the format of external type references. If you press the Ctrl +M keystroke combination, idasm.exe displays the metadata for each type within the DLL file assembly as in the following:

Metadata code

Manifest

The assembly manifest documents each module within the assembly, established the version and acknowledges the external reference assemblies with its dependencies. The Assembly manifest is a significant part of an assembly and can be composed of the following parts:
  • Identity

    It includes version, name, culture and public key details.
  • Set of Permissions

    This portion displays the necessary permissions to run an assembly.
  • List of Files

    It lists all the files belonging to a single-file or multiple-file assembly.
  • External Reference Assemblies

    The manifest also documents the external reference files that are needed to run an assembly.

We can explore the assembly manifest using the ildasm.exe utility as in the following:

Manifest

Now, open the CSharpTest.dll manifest by double-clicking the MANIFEST icon. The first code block specifies all the external assemblies, such as mscorlib.dll, required by the current assembly to function correctly . Here, each .assembly extern block is qualified by the .publickeytoken and .ver directive as in the following.

CSharpTest

Typically, these settings can be configured manually that reside in the solution AssemblyInfo.cs file as in the following:

  1. using System.Reflection;  
  2. using System.Runtime.CompilerServices;  
  3. using System.Runtime.InteropServices;  
  4.   
  5. [assembly: AssemblyTitle("CsharpTest")]  
  6. [assembly: AssemblyDescription("")]  
  7. [assembly: AssemblyConfiguration("")]  
  8. [assembly: AssemblyCompany("")]  
  9. [assembly: AssemblyProduct("CsharpTest")]  
  10. [assembly: AssemblyCopyright("Copyright ©  2013")]  
  11. [assembly: AssemblyTrademark("")]  
  12. [assembly: AssemblyCulture("")]  
  13. [assembly: ComVisible(false)]  
  14.   
  15. // The following GUID is for the ID of the typelib if this project is exposed to COM  
  16. [assembly: Guid("2fcf6717-f595-4216-bb93-f6590e37b3e5")]  
  17.   
  18.   
  19. [assembly: AssemblyVersion("1.0.0.0")]  
  20. [assembly: AssemblyFileVersion("1.0.0.0")]  
Resources

Finally, a .NET assembly may contain multiple embedded resources, such as picture files, application icons, sound files and culture information (satellite assemblies for building international software).

Summary

This article drilled down into the details of how the CLR resolves the location of external reference assemblies. You began with the disadvantages of existing COM technology and examined the content within an assembly such as CIL code, header, metadata, manifest and resources. You have also come to an understanding of the distinction between the single-file and multi-file assembly. This article also focuses on the benefits of modules and assemblies in depth. Later, you will also explore the more advanced topics related to assemblies.