.NET Assembly Internals: Part 1

Abstract
 
In this wonderful article series, we'll examine the core details of creating, deploying, and configuring .NET assemblies and their advantages over existing COM technology. This article goes deeper in terms of understanding the role and format of .NET assemblies and modules. You'll explore assembly manifests and how exactly the .NET runtime resolves the location of the assembly and you'll also get an understanding of the assembly CIL code. This article also states the distinction between single-file and multi-file assemblies.
 
Problem with COM
 
Microsoft itself introduced the phrase “DLL Hell” to describe the traditional problem with existing COM DLLs. Often an old DLL is replaced by a new version and breaks an application because the newly installed application overwrites a DLL that has also been used by another application. In fact, such a problem occurs due to not checking the versions of the DLL properly by the installation program while the new DLL should be backward compatible with the old version to continue the existing functionality. There is no side-by-side DLL installation feature provided by the existing COM technology. A DLL incorporates various functionality features and it is also referenced from other locations but such functionality is terminated when the old version is replaced by a new version with new functionality.
 
You can install two versions of a single assembly in the side-by-side installation feature. Although this can be applied with COM DLLs a problem exists in such a case. Literally, COM DLLs are not self-describing. The configuration of a COM component is stored in the registry, not in the Component DLL itself. So the configuration information is taken from the last version rather than two versions of a single DLL simultaneously.
 
Understanding Assembly
 
The .NET Framework overcomes the DLL Hell or version issues with existing COM technology by introducing assemblies. Assemblies are a self-describing installation unit, consisting of single or multiple files. Virtually, every file that is developed and executed under the .NET Common Language Runtime (CLR) is called an assembly. One assembly file contains metadata and could be an .EXE, DLL, or Resource file. Now, let's discuss some of the comprehensive benefits provided by the assembly.
  1. Assemblies can be deployed as private or shared. Private assemblies reside in the same solution directory. Shared assemblies, on the other hand, are libraries intended to be consumed by numerous applications on a single machine because they are deployed to a central repository called the GAC.
     
  2. The .NET assemblies are assigned a special 4-digit number to concurrently run the multiple versions of an assembly. The 4-digit special number can be specified as “<major>.<minor>.<build>.<revision>”.
     
  3. Assembly archives every external assembly reference they must have access to in order to function properly. However, assemblies are self-describing by documenting all the external references in the manifest. The comprehensive details of assemblies such as member function, variable name, base class, interface, and constructors are placed in the metadata so that the CLR does not need to consult the Windows system registry to resolve its location.
     
  4. The .NET Framework offers you to reuse types in a language-independent manner so it does not matter how a code library is packaged.
     
  5. Application isolation is ensured using application domains. A number of applications can run independently inside a single process with an application domain.
     
  6. Installation of an assembly can be as simple as copying all of its files. Unlike COM, there is no need to register them in the Windows system registry.
Modules
 
Before delving into assembly types in detail, let's discuss the modules. An assembly is typically composed of multiple modules. A module is a DLL without assembly attributes. To get a better understanding, we are creating a C# class library project as in the following.
  1. public class test  
  2. {  
  3.     public test() { }  
  4.     public test(string fname, string lname)  
  5.     {  
  6.         this.FName = fname;  
  7.         this.LName = lname;  
  8.     }  
  9.     public string FName  
  10.     {  
  11.         get;  
  12.         set;  
  13.     }  
  14.     public string LName  
  15.     {  
  16.         get;  
  17.         set;  
  18.     }  
  19.     public override string ToString()  
  20.     {  
  21.         return FName + " " + LName;  
  22.     }  
  23. }  
A module can be created by csc.exe with the "/module" switch. The following command creates a modules test.netmodule as in the following:
 
csc /target:module test.cs
 
A module also has a manifest, but there is no ".assembly" entry inside the manifest because a module has no assembly attribute. We can view a module manifest using the ildasm utility as in the following:
 
metadata
 
The main objective behind modules is that they can be used for faster startup of assemblies because not all types are inside a single file. The modules are loaded when needed. The second reason is, if you want to create an assembly with more than one programming language then one module could be in VB.NET and another in F#.NET. Finally, these two modules could be included in a single file.
 
Single file and Multifile Assembly
 
Technically speaking, an assembly can be formed from a single file and multiple files. A single-file assembly contains all the necessary elements such as CIL code, header files, manifest in a single *.exe or *.dll package.
 
single file assembly
 
A multifile assembly, on the other hand, is a set of .NET modules that are deployed and versioned as a single unit. Formally speaking, these modules are called primary and secondary modules. The primary module contains an assembly-level manifest and the secondary modules have a *.netmodule extension containing a module-level manifest. The major benefit of a multifile assembly is that they provide a very efficient way to download content.
 
multifile assembly
 
Assembly Structure
 
An assembly is comprised of assembly metadata describing the complete assembly, type metadata unfolding the exported type and methods, MSIL code and resources. All these fragments can be inside one file or spread across several files. Structurally speaking, an assembly is composed of the following elements:
 
Assembly Structure
 
CIL code
 
The CIL code is a CPU and platform-agnostic intermediate language. It can be considered to be the core backbone of an assembly. Given this design, the .NET assemblies can indeed execute on a variety of devices, architectures, and operating systems. At the runtime, the internal CIL is compiled using the Just-In-Time (JIT) compiler for the platform and CPU specific instructions.
 
CIL code
 
Understanding the grammar of CIL code can be helpful when you are building a complex application but unfortunately, most .NET developers are not deeply concerned with the details of CIL code.
 
Windows File Header
 
The Windows file header determines how the Windows family of operating systems can load and manipulate an assembly. The headers also identify the kind of application, such as DLL, console, or GUI application, to be hosted by Windows. You can view the assembly header information using the dumpbin.exe utility as in the following:
 
Dumpbin /headers *.dll/*.exe
 
Dumpbin exe
 
CLR File Header
 
The CLR header is a block of data that all .NET assemblies must support in order to be hosted by the CLR. It typically defines numerous flags that enable the runtime to understand the layout of the managed code. We can view such diverse flags, again by using the dumpbin.exe /clrheader flag as in the following:
 
CLR File Header
 
Metadata
 
The .NET runtimes practice metadata to resolve the location of types within the binary. Assembly metadata comprehensively describes the format of the contained types, as well as the format of external type references. If you press the Ctrl +M keystroke combination, idasm.exe displays the metadata for each type within the DLL file assembly as in the following:
 
Metadata code
 
Manifest
 
The assembly manifest documents each module within the assembly established the version and acknowledges the external reference assemblies with their dependencies. The Assembly manifest is a significant part of an assembly and can be composed of the following parts:
  • Identity
     
    It includes version, name, culture, and public key details.
     
  • Set of Permissions
     
    This portion displays the necessary permissions to run an assembly.
     
  • List of Files
     
    It lists all the files belonging to a single-file or multiple-file assembly.
     
  • External Reference Assemblies
     
    The manifest also documents the external reference files that are needed to run an assembly.
We can explore the assembly manifest using the ildasm.exe utility as in the following:
 
Manifest
 
Now, open the CSharpTest.dll manifest by double-clicking the MANIFEST icon. The first code block specifies all the external assemblies, such as mscorlib.dll, required by the current assembly to function correctly. Here, each .assembly extern block is qualified by the .publickeytoken and .ver directive as in the following.
 
CSharpTest
 
Typically, these settings can be configured manually that reside in the solution AssemblyInfo.cs file as in the following:
  1. using System.Reflection;  
  2. using System.Runtime.CompilerServices;  
  3. using System.Runtime.InteropServices;  
  4.   
  5. [assembly: AssemblyTitle("CsharpTest")]  
  6. [assembly: AssemblyDescription("")]  
  7. [assembly: AssemblyConfiguration("")]  
  8. [assembly: AssemblyCompany("")]  
  9. [assembly: AssemblyProduct("CsharpTest")]  
  10. [assembly: AssemblyCopyright("Copyright ©  2013")]  
  11. [assembly: AssemblyTrademark("")]  
  12. [assembly: AssemblyCulture("")]  
  13. [assembly: ComVisible(false)]  
  14.   
  15. // The following GUID is for the ID of the typelib if this project is exposed to COM  
  16. [assembly: Guid("2fcf6717-f595-4216-bb93-f6590e37b3e5")]  
  17.   
  18.   
  19. [assembly: AssemblyVersion("1.0.0.0")]  
  20. [assembly: AssemblyFileVersion("1.0.0.0")]  
Resources
 
Finally, a .NET assembly may contain multiple embedded resources, such as picture files, application icons, sound files, and culture information (satellite assemblies for building international software).
 

Summary

 
This article drilled down into the details of how the CLR resolves the location of external reference assemblies. You began with the disadvantages of existing COM technology and examined the content within an assembly such as CIL code, header, metadata, manifest, and resources. You have also come to an understanding of the distinction between the single-file and multi-file assembly. This article also focuses on the benefits of modules and assemblies in depth. Later, you will also explore the more advanced topics related to assemblies.