Moving C Structures into .NET with Custom Marshaling

figure1.gif

Figure 1 - Marshaling C to C#

In a world where legacy languages often prevail, we are reluctant to move away from them by reciting the well known mantra, "If it ain't broken, don't fix it". This mantra only gets you so far, until your competitor takes advantage of the newer, faster, more efficient, and more maintainable technologies, and you are left in the dust with your "pristine-aint-broke" legacy code.

One day, you find yourself sitting with your untouched, dust-collecting legacy code, when suddenly your boss tells you, "Convert this behemoth to .NET and do it by next Friday" (the behemoth being anything from C to COBOL). What strategy can you take to alleviate the pain of such a conversion?

Do it in Steps

It is often not necessary, or maybe even not possible, to convert your entire project over to a new technology all at once. Perhaps you can pick a simple interface to your legacy code and have .NET talk to it. If your code is some unmanaged Windows code, the best strategy may be to create a way to marshal information back and forth between the unmanaged interface calls and .NET calls. Well you might say to yourself, "How can I do that? My code contains thousands of methods. I have an API with over a million lines of code? So let's do a simple calculation. I have to convert over a million lines of code (just the interface, now) in two weeks. 1,000,000/ (2 weeks) * (5 programmer-days) * (10 hours) * (3600 seconds) = 2.78 lines/second. Then you say to me", I would have to convert 3 API calls every second and skip lunch." Then you start thinking back on your career dreams of being a simple desk clerk who can take longer lunches.

Use a Reverse Engineering Tool/Code Generator

There may be no long lunch in your future, but certainly you'll have time to scarf down a sandwich if you automate the generation of the marshaling code. For this example we will use the UML tool WithClass. Our legacy code is C and we would like to marshal the code to C#. We need to marshal both API calls and structures contained in those calls. In our example we will take the case where we have hundreds of structures and only a few API calls, so we only need to generate the marshaled code for the structures.

Microsoft provides several attributes for marshaling C to C#. Here is an example of marshaling file creation from the kernel32.dll in the windows API. 

Listing 1 - WIN32 API for creating a file

[DllImport("Kernel32.dll")]
static extern IntPtr CreateFile(
string filename,
[MarshalAs(UnmanagedType.U4)]FileAccess fileaccess,
[MarshalAs(UnmanagedType.U4)]FileShare fileshare,
int securityattributes,
[MarshalAs(UnmanagedType.U4)]FileMode creationdisposition,
int flags,
IntPtr template);

The idea behind marshaling is that when you "cross-over" from the world of managed code, into the world of unmanaged code, you transfer the same number of bytes in the way the unmanaged world wants to see them. You can marshal just about anything in .NET arrays, structures, and simple types. The problem you will run into with the current marshaling is when you attempt to marshal things like arrays of structures, or structures containing arrays of structures. .NET does not provide a simple way of doing this through attributes. So what do you do? Do you only call managed API's that don't contain complex structures? Do you throw up your hands and say, forget it, I'll just rewrite the unmanaged code and I'll easily have a solution 2 years later? Fortunately, with software, there is always a way.

Flattening your Data

Every simple type, structure, array, or is nothing more than a contiguous set of bytes. An integer consists of 4 bytes (32 bits), and a short consists of 2 bytes (16 bits). An array of shorts with 5 elements can be represented in 10 bytes (2 bytes x 5). An array of structs of size 14 bytes with 3 elements is 42 bytes (14 bytes x 3). No matter what type you are using, in the end, it's just a bunch of bytes. For complex structures with arrays of structures, you can fill a byte array and send it directly to the unmanaged structure. As long as the elements of the byte array are aligned with each position of each type in the structure, the unmanaged structure will be filled correctly. Another words, do your own custom marshaling, rather than relying on the attributes.

Mirroring your Structures in Managed Code

One strategy you might take for creating your structures on the managed side is to have all your classes simply contain a buffer of bytes equal to the number of bytes in the unmanaged structure. Then you simply "generate" a set of properties that are capable of reading and writing the buffer for a particular piece of data in the unmanaged code at the exact position in the buffer corresponding to the unmanaged data. Your class might look something like this:

Listing 2 - A flattened class for marshaling managed to unmanaged code

public class Account : CustomMarshalObject
{
MarshalBuffer _buffer =
new MarshalBuffer(6);
public short Id
{
get
{
return _buffer.ReadShort(0); // this is the first value in the structure so the offset is 0
}
set
{
_buffer.WriteShort (0);
}
}
public int AccountNumber
{
get
{
return _buffer.ReadInt(2); // Account number starts at an offset of 2 bytes after the Id
}
set
{
_buffer.WriteInt (2);
}
}
}

Our MarshalBuffer class contains a buffer of bytes and has methods to read and write every type in which we are interested. In order to read or write the byte in the correct position, we simply pass the offset into the buffer where the unmanaged byte would sit.

The MarshalBuffer class would consist simply of a buffer and several properties that knows how to read and write it. For Example, the MarshalBuffer might look something like the code below:

Listing 3 - A special buffer class for moving simple types into and out of a byte array

public class MarshalBuffer
{
/// <summary>
/// Buffer pointing to the bytes of the structure
/// </summary>
byte [] _buffer = null;
/// <summary>
/// Constructs of buffer by passing it the size of
/// the buffer
/// </summary>
/// <param name="size"></param>
public MarshalBuffer (uint size)
{
_buffer =
new byte[size];
}
/// <summary>
/// Constructs a SmartBuffer with an existing buffer
/// </summary>
/// <param name="buffer"></param>
public MarshalBuffer (byte[] buffer)
{
_buffer = buffer;
}
/// <summary>
/// Writes a short value to the buffer
/// at a particular offset
/// </summary>
/// <param name="val"></param>
/// <param name="offset">offset to start writing the short
/// value</param>
public void WriteShort(short val, uint offset)
{
// write low byte
Buffer.SetByte(_buffer, (int)offset++, (byte)val);
// write high byte
Buffer.SetByte(_buffer, (int)offset,
(
byte)(val>>8*1));
}
/// <summary>
/// Read a short value from the buffer at a particular
/// offset
/// </summary>
/// <param name="offset"></param>
/// <returns></returns>
public short ReadShort(uint offset)
{
return BitConverter.ToInt16(_buffer,(int)offset);
}
/// <summary>
/// Writes a short value to the buffer
/// at a particular offset
/// </summary>
/// <param name="val"></param>
/// <param name="offset">offset to start writing the short
/// value</param>
public void WriteInt(int val, uint offset)
{
Buffer.SetByte(_buffer, (
int)offset++, (byte)val);
Buffer.SetByte(_buffer, (
int)offset++,
(
byte)(val>>8*1));
Buffer.SetByte(_buffer, (
int)offset++,
(
byte)(val>>8*2));
Buffer.SetByte(_buffer, (
int)offset,
(
byte)(val>>8*3));
}
/// <summary>
/// Read a short value from the buffer at a particular
/// offset
/// </summary>
/// <param name="offset"></param>
/// <returns></returns>
public short ReadInt(uint offset)
{
return BitConverter.ToInt32(_buffer,(int)offset);

...
}

Generating Managed Code for Unmanaged Structures

Because each buffer position is tedious to calculate, the custom Marshaled objects are best generated from a code generator that calculates the offsets based on the position and size of the unmanaged types in the structure. Using WithClass, we can accomplish this by first writing a VBA(or C#) script to reverse engineer all the structures in the unmanaged code into classes. Then we need to write a more complicated script to generate all of our custom marshaled managed classes.

At this point you may be asking yourself, why I should bother writing a tedious script to generate all this stuff. Wouldn't it be easier to do by hand? If you have just a few structures you are marshaling, then by all means, get out the calculator and start calculating offsets. If, however, you have myriads of structures you want to marshal between managed and unmanaged code, I would not advise the manual method. Below are the advantages to creating automated scripts to marshal your code.

  1. The script can be used over and over again, even if you add a new structure to your unmanaged code, it will automatically and accurately calculate all the byte offsets and sizes.
  2. If you make a major change to any of your existing structures, you can safely regenerate for all structures and know you "covered your bases".
  3. You can always make slight alterations to your script to account for new structures.

I just completed a project which had over 800 structures I needed to marshal, so there was no way I was writing my marshaled structures manually ('lest I desired a bad case of carpal tunnel syndrome).

Admittedly writing code to reverse engineer C code and generate C# code is a bit difficult. But I suspect once finished, your scripts can be reused for eternity for the particular structures you are parsing.

It turns out, that with WithClass's built in parser object, reverse engineering structures in C is fairly easy. Below is the VBA code for reverse engineering all the C structures in your file. The code uses a simple state machine (in the form of a select statement) to go through each token in the file and act accordingly to produce a WithClass structure to contain the structures information.

Listing 4 - VBA script  for reverse engineering C structs into WithClass

Attribute VB_Name = "ReverseStructs"
Option Explicit
Option
Base 1 'All array indexes start with 1
Private wcDocument As With_Class.Document
Private currentPackages As With_Class.Packages
Private currentPackage As With_Class.Package
Private currentClasses As With_Class.Classes
Private currentClass As With_Class.Class
Private currentAttributes As With_Class.Attributes
Private currentAttribute As With_Class.Attribute
Private currentOperations As With_Class.Operations
Private currentOperation As With_Class.Operation
Private wcDiagram As With_Class.ClassDiagram
Private sMsg As String
Private
iFileID As Integer
Private
sFileName As String
Private
structParser As With_Class.Parser
Private mainClass As With_Class.Class
Private nestedClass As With_Class.Class
Sub ReverseFile()
' Generate the c# code
' For each structure
Dim nextToken As String
Const idle_state As Integer = 0
Const struct_state As Integer = 1
Const type_state As Integer = 2
Const name_state As Integer = 3
Const array_state As Integer = 4
Const typedef_name_state As Integer = 5
Const internal_struct_state As Integer = 6
Dim nestedStruct As Boolean
Dim x As Integer
Dim nextState As Integer
Dim y As Integer
' Set up WithClass Parsing Object
wcDocument = With_Class.ActiveDocument
wcDiagram = wcDocument.ClassDiagrams.Item(3)
' wcDocument.NewDiagram(0, "parsed diagram")
structParser = wcDocument.NewParser
structParser.WriteBufferFromFile("C:\MyStructureFile.h")
' Initial state is idle
nextState = idle_state
y = 20
' set up initial class position
x = 20
nestedStruct =
False
' while we haven't reached the end of the file, continue
While (nextToken <> "NOTOK3N")
nextToken = structParser.GetNextNonSpaceToken
' skip comments
Do While (nextToken = "/" And IsNextNonSpaceToken("*"))
structParser.GotoToken("/")
nextToken = structParser.GetNextNonSpaceToken
If (nextToken = "NOTOK3N") Then
Exit Do
End If
Loop
' This is our state engine.
' A token is retrieved each time through the loop
' and acted upon in the current state
' States transition based on the next token.
' Actions are taken also based on the nexttoken
Select Case nextState
Case idle_state
If nextToken = "struct" Then
nextToken = structParser.GetNextNonSpaceToken
currentClass = wcDiagram.NewClass(x + 50, y, nextToken)
If (nextToken <> "{") Then
structParser.GotoToken("{") ' skip past bracket
End If
y = y + 50
If (y > 32000) Then
y = 0
x = x + 50
End If
nextState = type_state
End If
Case type_state
If nextToken = "struct" Then
' we found an internal structure
nestedStruct = True
nextState = internal_struct_state
ElseIf (nextToken = "}") Then ' reached the end of the struct, get the typedef
nextState = typedef_name_state
Else
currentAttribute = currentClass.NewAttribute("")
currentAttribute.Type = nextToken
nextState = name_state
End If
Case name_state
currentAttribute.name = nextToken
If (IsNextNonSpaceToken("[") = True) Then
nextState = array_state
ElseIf (IsNextNonSpaceToken(";") = True) Then
structParser.GetNextNonSpaceToken() ' skip past ;
nextState = type_state
End If
Case array_state
currentAttribute.IsArray =
True
currentAttribute.length = structParser.GetLineToToken("]")
' shrink by one
currentAttribute.length = Left(currentAttribute.length, Len(currentAttribute.length) - 1)
structParser.GotoToken(";")
nextState = type_state
Case typedef_name_state
' extract the name into the class
If nestedStruct = True Then
' special handling for nested structures, tag on item name
currentClass.name = currentClass.name & nextToken
' also handle nested class field inside the main class
currentAttribute = mainClass.NewAttribute(nextToken)
' assign the generated type to the attribute type in the main class
currentAttribute.Type = currentClass.name
' test for array
If (IsNextNonSpaceToken("[")) Then
structParser.GetNextNonSpaceToken() ' skip bracket
currentAttribute.length = structParser.GetLineToToken("]")
' shrink by one
currentAttribute.length = Left(currentAttribute.length, Len(currentAttribute.length) - 1)
'its an array
currentAttribute.IsArray = True
End If
'create a relationship between the two classes
If (currentAttribute.IsArray) Then
wcDiagram.NewAggregation(mainClass, currentClass, currentAttribute.name, "One", "Many")
Else
wcDiagram.NewAggregation(mainClass, currentClass, currentAttribute.name, "One", "One")
End If
nestedStruct = False ' reset the nested class variable
currentClass = mainClass
structParser.GotoToken(";")
nextState = type_state
Else
currentClass.name = nextToken
structParser.GotoToken(";")
' skip past ;
nextState = idle_state
End If
Case internal_struct_state
' we need to do three things
' save reference to main class
mainClass = currentClass
' create a new class with the internal structure
currentClass = wcDiagram.NewClass(x, y, mainClass.name + "_")
' now we need to populate fields of nested struct
nextState = type_state
Case Else ' Other values.
End Select
End While
' arrange the classes
wcDocument.ArrangeClasses()
wcDocument.ArrangeRelationships()
End Sub
' This function checks the next non-space token against a test string without advancing the pointer in the buffer
Function IsNextNonSpaceToken(ByVal testtoken As String) As Boolean
Dim token As String
structParser.StorePtr()
token = structParser.GetNextNonSpaceToken
If (testtoken = token) Then
IsNextNonSpaceToken = True
Else
IsNextNonSpaceToken = False
End If
structParser.RestorePtr()
End Function

When the script is run on the file C:\MyStructureFile.h, it will produce a diagram of all your unmanaged C structures in WithClass. Now how do we produce C# from the diagram?

Code Generation Script

The way to produce code is to loop through all the structures in WithClass and keep a running count of sizes and offsets into the structures. As we loop through the structures, we write the information contained within them into our custom marshal classes in C#. The code generation script is too extensive for the scope of this article, so we will just look at one important piece, the AddAttributes method that adds the fields to the class. This method loops through all the attributes inside the class passed into the function. It writes out each attribute to the C# file as a property. The property produced looks similar to the property in the custom marshal class in listing 2. The property contains an accessor and modifier to read and write the byte buffer for marshaling our data to unmanaged code at the exact offset inside the buffer. Although the code generation script looks complicated, it really is mostly just a lot of concatenating strings to form the proper lines of code for each property. Also in the code below, the offset of the field is calculated and hard-coded into the generated code for each property.

Listing 5 - Code Generation Function for creating marshaling properties from UML Attributes in WithClass

Function AddAttributes(ByVal iFileID As Integer, ByVal currentClass As With_Class.Class) As Long
Dim size As Long
Dim increment As Integer
Dim offset As Long
Dim length As Integer
Dim attributeIsStructure As Boolean
Dim strConstruction As String
Dim strOffset As String
' clear this collection first
ClearCollection(structureConstructions)
ClearCollection(structureOffsets)
Dim attributeIsDate As Boolean
' Loop through all the attributes, and use the attribute data, to write each
' Property out to the C# file
currentClass.Attributes.Restart()
While (currentClass.Attributes.IsLast = False)
attributeIsDate =
False
currentAttribute = currentClass.Attributes.GetNext
attributeIsStructure = IsStructure(currentAttribute.Type)
If (currentAttribute.Type = "char") Then
' string
size = 1
If currentAttribute.IsArray Then
size = currentAttribute.length ' compute the size of the string
Print #iFileID, " public string ";
Else
Print #iFileID, " " & " public " & currentAttribute.Type & " ";
End If
ElseIf attributeIsStructure = True Then
' calculate the size of the structure (recursively)
size = CalculateStructureSize(currentAttribute.Type)
If (currentAttribute.IsArray) Then ' need to marshal through byte array
' scale size to length of array
size = size * currentAttribute.length ' its an array of structures
Print #iFileID, " private " & AdjustTypeName(currentAttribute.Type) & "Array
" & AdjustFieldName(currentAttribute.name) & "Array = null;"
strConstruction = " " & AdjustFieldName(currentAttribute.name) &
"Array = new " & AdjustTypeName(currentAttribute.Type) &
"Array(_buffer, _offset + " & offset & ", " & currentAttribute.length & " );"
strOffset = AdjustFieldName(currentAttribute.name) & "Array.Offset = " &
offset & " + value;"
structureConstructions.Add(strConstruction)
' add to list of accessor constructors
structureOffsets.Add(strOffset)
Print #iFileID, " public " & AdjustTypeName(currentAttribute.Type) &
"Array " & " ";
Else
Print #iFileID, " private " & AdjustTypeName(currentAttribute.Type) &
" " & AdjustFieldName(currentAttribute.name) & "Accessor = null;"
strConstruction = AdjustFieldName(currentAttribute.name) &
"Accessor = new " & AdjustTypeName(currentAttribute.Type) &
"(_buffer, _offset + " & offset & " );"
strOffset = AdjustFieldName(currentAttribute.name) & "Accessor.Offset = "
& offset & " + value;"
structureConstructions.Add(strConstruction)
' add to list of accessor constructors
structureOffsets.Add(strOffset)
Print #iFileID, " public " & AdjustTypeName(currentAttribute.Type) & " ";
End If
Else
Print #iFileID, " public " & ConvertType(currentAttribute, increment) & " ";
size = increment
End If
Dim tmp As Integer
Print #iFileID, AdjustTypeName(currentAttribute.name);
' add array tag for structure arrays
If (attributeIsStructure And currentAttribute.IsArray) Then
Print #iFileID, "Array"
Else
Print #iFileID, ""
End If
' write out the get accessor of the property
Print #iFileID, " {"
Print #iFileID, " get"
Print #iFileID, " {"
If (attributeIsStructure = False) Then
If attributeIsDate = True Then
HandleDateGet(iFileID, currentAttribute, offset)
Else
Print #iFileID, " return _buffer.Read";
Print #iFileID, GetParamString(currentAttribute) & "(" & offset & " + _offset ";
If (currentAttribute.IsArray) Then
' special handling for strings
Print #iFileID, "," & currentAttribute.length;
End If
Print #iFileID, ");"
End If
Else
' attribute is strucure, just return accessor member
If currentAttribute.IsArray Then
' if its a structure array, return array variable
Print #iFileID, " return " & AdjustFieldName(currentAttribute.name) &
"Array;"
Else
Print #iFileID, " return " & AdjustFieldName(currentAttribute.name) &
"Accessor;"
End If
End If
Print #iFileID, " }"
If (attributeIsStructure = False) Then ' don't create sets for structures
' write out the set modifier of the property
Print #iFileID, " set"
Print #iFileID, " {"
If attributeIsDate = True Then
' special handling for date
HandleDateSet(iFileID, currentAttribute, offset)
Else
Print #iFileID, " _buffer.Write";
Print #iFileID, GetParamString(currentAttribute) & "(" & "value," & offset &
" + _offset ";
If (currentAttribute.IsArray) Then
' special handling for strings
Print #iFileID, "," & currentAttribute.length;
End If
Print #iFileID, ");"
End If
Print #iFileID, " }"
End If
Print #iFileID, " }"
offset = offset + size
' calculate next offset into the byte array
Print #iFileID, ""
End While
AddAttributes = offset ' return the size
End Function

Conclusion

Although, I didn't cover the rules of passing and receiving data from managed to unmanaged code (marshaling), you'll need to keep them in mind. Remember there is a garbage collector on the managed side possibly collecting your data, so if you are trying to receive data on the unmanaged side, make sure you use Marshal.AllocHGlobal on the .NET side to allocate your buffer so the garbage collector can't touch it. (Supposedly PInvoke automatically pins the data, but I'm not sure it pins, the data references pointing to other data). Also, in hindsight, because WithClass is a COM object, the scripts for reverse engineering and code generation could have just as easily been written in C# (and probably would have been much more readable). Anyway, when the Marshal(er) is in town and you want to wrestle with the legacy of  Jess-C-James and the COBOL Kid, you can always put up a fight with class in the world of .NET.


Similar Articles