ADO.NET : In Depth Tutorial

Mike Griffin
5y
44.6k
0
0

Article

Summary

This article assumes a working knowledge of ADO.NET and examines it and its influence on the middle tier. It introduces a fictitious business model and lists some requirements to help us focus on the middle tier. It also introduces a new mechanism called SQLData, which combines both the ADO.NET DataSet and the ADO.NET DataReader into a single mechanism. The SQLData struct is used as the basis for developing a middle tier. Finally, the performance of the SQLData struct is measured and contrasted with a wizard-generated strongly typed DataSet. Many .NET items such as interfaces, enumerators, and delegates are used.

The Middle tier

In a traditional three-tiered approach to software development the middle tier, or business object layer as it is sometimes referred to, is the layer of architecture that models and enforces the business rules and/or data of an organization. A properly designed middle tier should express itself in a highly readable form, be robust, be resilient to change, and be distinct from the user interface. Generally, component-based architectures that can express the business rules and processes, as well as scale to meet demand, work well for creating middle-tier architecture. Enter .NET.

You may be asking, Why, of all the new technologies and features in the Microsoft .NET Framework, should I focus on the middle tier? Hasn't enough attention already been paid to this area? There are two main reasons this article focuses on the middle tier.

First, the introduction of the Microsoft .NET Framework and its comprehensive class library will attract many newcomers, converts, and old-time Microsoft C++ ATL COM guys like myself. At first glance, the .NET Framework can be overwhelming -- not with its complexity, but with the breadth of infrastructure only a .NET namespace away. The point here is that although technology is always changing, good software development practices span technologies. This article looks at the .NET Framework from the middle-tier point of view because that's where many of us labor. Also, many developers who previously worked only in ASP script are going to find themselves either working on or writing the middle tier, because the ASP.NET programming model uses the same first-class language(s) that the middle tier uses. ASP.NET developers will be writing .NET components -- and this is good for the industry, as more people will be available to work on core infrastructure. Many developers, new and old and with varying experience and backgrounds, will find themselves challenged by the task of writing a coherent middle-tier -- whether they know that's what they're writing or not. Some mentoring will have to take place, and I hope this article will help some get a foothold in the .NET Framework.

Second, I have seen the introduction of XML cause confusion, especially with respect to its relationship to the middle tier. Some developers are even drifting, knowingly or unknowingly, back to what I call the monolithic application. Why write a middle-tier component when SQL Server 2000 can retrieve our data in native XML? We could then transform the data into HTML through XSL/XSLT and go directly to the browser. Why create a middle-tier component with all of its methods and properties? The answer to this question is an article in itself.

Suffice it to say that you have to be able to receive the data back from the browser. Do you receive the modified XML and simply stuff it directly into the database? If there are no business rules associated with the data that's a possibility; this is not very likely, however, even for the smallest application. There is nothing preventing a middle-tier component from offering itself up as XML data; the trick is to know how to handle the modified XML data when it's received, and to invoke the proper business logic.

My caution to the reader: Be careful if you are currently weighing the decision to develop a new project without a well-defined middle tier. This is not to say that it cannot be done successfully, but there is considerable risk involved in attempting to do so.

Our Data Model and Requirements

The example we are going to use throughout this article will be a very simple human resources middle-tier model. We will need four business objects: an Employee collection, an Employee object, a Dependents collection, and a Dependent object. An Employee can have zero to many Dependents. Let's look at our hierarchical model.

Figure 1 The Hierarchical Data Model

Our requirements are as follows. We want both our Employees and Dependents collections to be able to hold instances of our individual business objects. At the same time, we want to be able to create and work with our individual business objects outside of our collections in cases where we desire only one record of data. When working with read-only data, we want to use the ADO.NET DataReader class for performance reasons, and the ADO.NET DataSet class when we desire writeable data. We want to use these two different data access mechanisms without affecting the readability of our code. Finally, we want to avoid calling the new operator as much as possible and thereby reduce our garbage collection and object creations cycles (as you will see, these things still matter). Our collection classes must only allow access to the objects via the foreach syntax.

Where to Start?

The middle tier often has a lot to do with exposing a well-designed data model to the presentation layer. Consuming ADO.NET then becomes the first order of business for the middle-tier developer. The differences between ADO and ADO.NET are significant, and will probably be one of the most confusing issues in transitioning to the .NET Framework. An early start is a good idea.

If you are familiar with ADO (the COM-based data access strategy), then you are familiar with the Recordset component. A Recordset can operate in either a disconnected or a connected fashion, and it is capable of serving up data in XML. ADO Recordsets can also operate in server-side mode -- a capability ADO.NET currently does not support. This is not usually a concern, as keeping connections open in web-based applications is usually inadvisable.

Unlike ADO, ADO.NET does not offer a single component that operates in both a connected and a disconnected mode. Instead, there are two layers, the DataSet for working with data in a disconnected and updateable fashion and the DataReader for working with data in a high performance, connected fashion. Using a DataReader can be much faster than using a DataSet. We're going to try to take advantage of this in our architecture.

Many, if not most, web sites (and systems) read from a database much more than they write to one. It would be nice if there were an encapsulation of the two ADO.NET layers that would present a single API into our SQL data. This mechanism should hide the differences in API between the DataSet and the DataReader. This, if you recall, is one of our requirements.

Introducing SQLData

SQLData is a concept-turned-prototype that was created to abstract and unify the DataSet and DataReader into a single API. The SQLData is implemented as a struct rather than as a class for performance reasons. If SQLData were a class, it would have to be created using the new operator; this would incur both memory and object-creation penalties. One of our requirements is to try to avoid using the new operator. A general rule to follow is that not doing something is usually faster than doing it. On a large website with lots of traffic, performance issues are easier to fix at the beginning of a project than near the end. Because SQLData is a struct, it is value-based rather than a reference (classes are references and must be created using the new operator). This means that SQLData is allocated on the stack, as would be the case in a simple int data type.

Figure 2 SQLData Encapsulates the DataSet and DataReader

Figure 2 illustrates the main idea behind the SQLData struct: The SQLData struct becomes our access strategy. Only SQLData has to deal with and understand the differences between the DataSet and DataReader. Let's look at some code that illustrates how SQLData hides the complexity. The code below creates an Employee object and loads a read-only employee record. It then asks for the 'FirstName' column.

public enum RWMode {
Invalid = 0,
Read, // Read Only data
Write // Writable data
}
Employee Emp = new Employee();
Emp.Load(RWMode.Read, stringEmpID);
// Read the FirstName
string FirstName = Emp.FirstName;

Simple enough. Let's look at the Employee class and see what it takes to implement this logic.

public class Employee {
private static readonly int FIRST_NAME = 1;
public void Load(RWMode Mode, string EmpID) {
m_Data.OrphanLoad(Mode,
"SELECT * FROM Employees WHERE EmployeeID = '" + EmpID + "'",
"Employees");
}
public string FirstName {
get { return m_Data.GetStringField(FIRST_NAME); }
set { m_Data.SetStringField(FIRST_NAME, value); }
}
//======================================================
// Data Memebers
//======================================================
private SQLData m_Data;
}

Notice that there is very little work for us to do. The 'get' and 'set' property accessors couldn't be easier. We simply ask the SQLData for the data in our FirstName column, which happens to be a string. The SQLData struct knows to use an ADO.NET DataReader because the Employee was created using RWMode.Read.

Notice that the Load method calls SQLData.OrphanLoad. SQLData provides two sets of methods. One set is prefixed with Orphan and lets the SQLData struct know to act as a stand-alone object; another, prefixed with Collection indicates to SQLData that it is to operate as a collection. The SQLData struct allows us to load and access data without worrying about the underlying ADO.NET access mechanism being used.

This is not, however, all SQLData can do for us. SQLData can act as a bookmark -- in this context, a reference to an individual row in a set of data. The term bookmark is not new, as the ADO Recordset has a property named Bookmark. One SQLData struct can act as a bookmark into another SQLData struct. An Employee object housed in a collection uses its local SQLData struct to point to a row in the Employees collection. For instance, an Employee collection object can retrieve all of the employee records from the database, then create an Employee object and instruct it to point to a row of data in the collection. The SQLData struct does all of this on behalf of the Employees collection by looping through the data in the underlying DataSet or DataReader and passing itself as a bookmark to the individual Employee object. Thus, an Employee object uses its SQLData to hold the actual data when it exists as an orphan or uses it as a bookmark into an Employee collection.

An illustration here might prove useful. The Employee collection in Figure 3 contains ten employee records. Each Employee object is using its SQLData struct as a bookmark into the collection. This differs from the previous code sample, in which an individual Employee object was created. In that case, the Employee objects SQLData actually contained the data rather than acting as a bookmark into a collection. The SQLData struct hides all of this from the developer.

Figure 3 SQLData acting as a bookmark into another SQLData struct

A special case exists in the SQLData struct that allows for a significant performance gain. Figure 3 shows a unique object for each row of data in our collection object. The performance gain can be realized by using the ObjectCreation enum. Let's look at how this is used.

// How should the collection deal with object creation issues
public enum ObjectCreation {
Single, // Use only a single to represent the entire collection
Unique // Give every row in the collection its own Object
}

When using ObjectCreation.Single you are telling the SQLData struct that although, for example, there are ten rows of Employee data contained within, the goal is to have them all represented by a single Employee object. If the desire is for a unique object to represent each row of data, the ObjectCreation.A unique value is used. This is easy to do because our Employee class already supports the bookmark mechanism. Instead of creating ten separate Employee objects, the SQLData struct can create just one and set its bookmark to the current row during the foreach iteration logic. The developer has no way of knowing that this is all going on. As I will demonstrate later, using the ObjectCreation type Single hint can sometimes have a dramatic effect on performance. This option is available only with read-only data. The initial prototype supported ObjectCreation.Single for writeable data as well but in the end, it didn't make sense. Let's look at a code sample.

// Create an Employees collection and load all employees
Employees Emps = new Employees();
// Request that there be only one Employee object created
Emps.LoadAll(RWMode.Read, ObjectCreation.Single);
string str;
// Loop through and access all of the properties
foreach(Employee Emp in Emps) {
str = Emp.EmployeeID;
str = Emp.FirstName;
str = Emp.LastName;
str = Emp.City;
str = Emp.State;
str = Emp.Zip;
}

The code above, although it might loop through a thousand employee records, will create and use only a single Employee object to do so. This is because it uses bjectCreation.Single value. Very few resources are created, considering the functionality of this code, and no hacky workarounds are used. All of our business logic is right there in the middle of the mix, ready to act when the moment arises. a foreach

Syntax

There are a few reasons why only the foreach syntax was chosen for the iteration of objects in our collection instead of an indexer or some other mechanism. This syntax forces the developer to declare each part within the foreach statement itself. For example, the following code will not compile because the Employee object is declared outside the foreach parenthesis:

// Declare our employee object here, WILL NOT COMPILE !!!
Employee Emp;
// Loop through and access all of the properties
foreach(Emp in Emps) {
// ...
}

The developer cannot accidentally hang on to an instance outside of the foreach loop. It's not foolproof, but it goes a long way in avoiding misconceptions, especially when using ObjectCreation.Single, where only one object exists in the collection anyway.

However, this is not the only reason to use the foreach syntax. There is something very interesting here that one could easily gloss over, and it's related to the compile error shown above. The foreach syntax is implemented via the IEnumerator interface and objects are returned through this method.

public object Current {
get { // Your code goes here }
}
}

Notice that the return type of the Current property is object yet, as observed in the foreach loop above, no typecasting was done. No typecasting is required for the same reason that trying to declare each part of the foreach loop outside of the parenthesis generates a compile error. A foreach loop must have the concrete type Employee within the foreach loop parenthesis because the compiler does the required typecast for us based on the concrete type.

This very convenient capability of the foreach syntax becomes key to our SQLData struct concept. The SQLData struct contains all of the data in our system, either in a DataSet or a DataReader. Furthermore, our SQLData struct has no idea what kind of objects it actually contains, so it is forced to work with a collection of type object. If it weren't for the foreach syntax, there would either be a lot of ugly typecasting or plenty of needless class derivations simply to return a concrete data type rather than an object.

Dont overlook the foreach syntax, it offers a built-in typecast -- something which an indexer doesn't.

SQLDataEnumerator

In order to support the foreach syntax, a generic class is needed that inherits from IEnumerator and is capable of iterating over our SQLData struct. The class is called SQLDataEnumerator. This capability means that SQLDataEnumerator can iterate over either a DataSet or a DataReader. Below is a code snippet demonstrating how the Employees collection implements the foreach syntax by using SQLDataEnumerator.

public class Employees: IEnumerable {
public IEnumerator GetEnumerator() {
// SQLDataEnumerator implements 'foreach' and it
// handles all of the issues related to DataSet,
// SQLDataReader, and even 'Single' or 'Unique'
// object creation rules.
return new SQLDataEnumerator(this, ref m_Data);
}
}

The first step is for the Employee collection to inherit from the IEnumerable interface (see above). This indicates that the Employees collection supports the foreach syntax. The only method in the IEnumerable interface we need to implement is the GetEnumerator method. This is quite easy -- all we have to do is create our SQLDataEnumerator object, being sure to pass it our SQLData struct.

The SQLDataEnumerator constructor is shown below. Notice that the Employees collection passes this as the first parameter. The SQLDataEnumerator expects an ISQLCollection interface as its first parameter. Well, talk about that in the next section.

internal class SQLDataEnumerator: IEnumerator {
public SQLDataEnumerator(ISQLCollection ICollSQLObject, ref SQLData MyData) {
// stuff goes on in here
}
}

The SQLDataEnumerator is responsible for implementing the IEnumerator interface (not to be confused with the IEnumerable interface used by the Employees collection above).

internal class SQLDataEnumerator : IEnumerator
{
public SQLDataEnumerator(ISQLCollection ICollSQLObject, ref SQLData MyData);
// These three methods fulfill the IEnumerator contract
public void Reset();
public object Current; // This is a 'get' property
public bool MoveNext();
}

The programmer almost never uses an IEnumerator directly; the IEnumerator interface is handled by the foreach syntax.

SQLData and Interfaces

With the use of the interface mechanism, the SQLData struct obviates the need to know concrete class types. Base classes with virtual methods could have been used in place of interfaces; however, it's quite easy to hide interfaces from the consumers of our classes. Inheriting from a class instead of an interface means that all methods in the base class become public. This can be overcome, but why bother? Only the SQLData struct and our internal namespace should know about these interfaces. Since SQLData doesn't want to know concrete class types such as our Employees or Dependents classes, it asks you to implement two interfaces:

ISQLCollection for the collection classes, i.e., Employees and Dependents
ISQLObject for the individual classes, i.e., Employee and Dependent

Let's take a look at the interface definitions, and well see how they help us.

interface ISQLCollection {
ISQLObject i_CreateISQLObject(); // Hey Collection - create an object of your type
}
interface ISQLObject {
void i_AddNew(SQLData Data); // A new ISQLObect was added to the collection
void i_Delete(); // An ISQLObject was just deleted from the collection
void i_PreSave(); // About to save this ISQLObject
void i_PostSave(); // Just saved this ISQLObject
DataRow i_GetDataRow(); // Sometimes the Collection needs are DataRow
void i_SetBookmark(SQLData Data); // Think 'ADO' Bookmark
}

Both interfaces are called only by the SQLData struct, and only when it is acting as a collection of business objects such as an Employee or Dependents collection. Collections inherit from the ISQLConnection interface. Individual objects such as the Employee and dependent classes inherit from the ISQLObject interface.

The only method an ISQLCollection must implement is the i_CreateSQLObject method, which is called by the SQLData struct in order to create individual business objects. SQLData does so when creating individual objects to represent a row of data in its collection (see Figure 3).

Below is all that our Employee's collection must do to load all of the employees in our database. LoadAll takes the two enums we discussed earlier. The call to m_Data.CollectionLoad is passed this pointer, thereby providing the SQLData an ISQLCollection interface. SQLData now knows it is residing in a collection, rather than in an individual object.

public class Employees: IEnumerable, ISQLCollection {
public void LoadAll(RWMode rwAccess, ObjectCreation InstInfo) {
// Ask our SQLData object to load this data
m_Data.CollectionLoad(this, rwAccess, InstInfo, "SELECT * FROM Employees",
"Employees");
}
ISQLObject ISQLCollection.i_CreateISQLObject() {
return (ISQLObject) new Employee();
}
//===============================================
// Data Members (only 1 data member !!)
//===============================================
private SQLData m_Data;
}

Here is a technical representation of how the SQLData struct uses interfaces. In this case, an Employee collection has been created that contains many rows of data and has a unique Employee object to represent each row. Only one Employee object is shown; in reality, however, there would be many.

Figure 4 SQLDatas Interface Usage

The m_Objects list in the SQLData struct is a collection of ISQLObjects (Employee objects) and the m_ISQLColl data member points to our Employees collection. Interestingly, both the Employees and Employee objects contain an SQLData struct. The SQLData struct in the Employees collection actually contains the data, whereas the SQLData struct in the Employee object is acting as a bookmark. Technically, there is a third scenario: an Individual Employee object may be created, in which case the SQLData is just plain old data -- no collection, no bookmark.

Let's look at SQLDatas (simplified) collection load method so we can see how this all happens:

public bool CollectionLoad
(
ISQLCollection ISQLColl,
RWMode rwAccess,
ObjectCreation ObjCreate,
string strSelect,
string strTable
)
{
// Grab onto and save this all important information
m_ISQLColl = ISQLColl;
m_RWAccess = rwAccess;
m_ObjectCreation = ObjCreate;
m_strSelect = strSelect;
m_strTable = strTable;
if (IsWriteable) {
// We always fetch the data upfront for Writeable data
FetchWriteableData();
m_Objects = new ObjectList();
int iRowCount = m_dsDataTable.Rows.Count;
// Create our Employee's collection
if (iRowCount > 0) {
for (int iCount = 0; iCount < iRowCount; iCount++) {
// Create our ISQLObject, for instance, if this is an Employees
// collection we are asking our Employees collection to create
// an Employee object
ISQLObject IObject = m_ISQLColl.i_CreateISQLObject();
// This is kind of like an ADO Bookmark
m_dsCurrentRow = m_dsDataTable.Rows[iCount]; // The ISQLObject ("Employee") points
o this m_dsCurrentRow and
// makes a shallow copy of the 'this' pointer passed in which is
// a SQLData type. Thus the m_dsCurrentRow will forever be in his
// SQLData type, he has his bookmark
IObject.i_SetBookmark(this);
m_Objects.Add(IObject);
}
}
}
}

The i_CreateSQLObject method in our ISQLCollection interface depends on the Employee class inheriting from the ISQLObject interface, which it does.

The Hierarchical Model

We have chosen to express our business objects (or middle tier) as a set of hierarchical objects. This allows us to iterate over our data in a very natural way. XML is such a nice way to represent data because of its ability to represent hierarchical data. Our employee-dependent relationship is hierarchical, so why not express our middle tier hierarchically also?

string FirstName;
Employees Emps = new Employees();
Emps.LoadAll(RWMode.Read, ObjectCreation.Single);
// For every Employee in our Employees collection
foreach(Employee Emp in Emps) {
FirstName = Emp.FirstName;
// Given this Employee, loop through its Dependents
foreach(Dependent Dep in Emp.Dependents)
FirstName = Dep.FirstName;
}
}

This hierarchical model is reflected in many other areas of our architecture as well.

As is the case with iterating using the foreach syntax, both saving and deleting objects are hierarchical operations. For instance, we can load a writeable collection of Employees and then change each employee's FirstName to Bob. We could continue by iterating over each employee's Dependent collection, changing each dependents FirstName to Joe. We could commit all of the changes to the database simply by calling Employees.SaveAll, which would save all of the changes made to our dependent data as well.

This is all done through the ISQLObject interface described earlier. When we save data, we save it from the top down; i.e., Employees, then Dependents. We save from the top down so that we won't get into trouble with the referential integrity that may exist in our database. If you look at the i_PostSave method in the Employee class, you'll notice that this is where we save our Dependents child collection (emphasis PostSave). Accordingly, when we call DeleteAll, we delete from the bottom up, again to comply with database referential integrity rules. Let's look at the Employee objects implementation of ISQLObject.i_PostSave:

public class Employee: ISQLObject {
void ISQLObject.i_PostSave() {
// We are housed in a collection and our data
// has just been saved to the database, we now
// need to call SaveAll() on any objects that may have
/ been accessed.
if (m_Dependents != null) {
m_Dependents.SaveAll();
}
}
}

Notice that the method i_PostSave is prefixed with ISQLObject. This is not necessary, but doing so hides the interface method from the consumers of our MiddleTier namespace. Since there is no way of publicly knowing that these interfaces are supported, consumers of the MiddleTier namespace could never call them. This is exactly the behavior we want.

Strongly Typed Datasets

The experienced ADO.NET developer must be asking by now, Why not just use the Server Explorer and drag and drop a DataSet, Connection, and DataSetCommand onto a form, and generate the desired Employees collection class through the wizard mechanism?

This is exactly what I did. I created a class through the wizard named tdsEmployees where tds stands for Typed DataSet. The tdsEmployees class inherits from DataSet. Below is a code sample for comparison purposes. The code sample selects all of the employees in the database using both the Employee and the tdsEmployees classes.

//----------------------------------------------------
// Employees Object - Loop through all of them
//----------------------------------------------------
Employees Emps = new Employees();
Emps.LoadAll(RWMode.Write, ObjectCreation.Unique);
string str;
foreach(Employee Emp in Emps) {
str = Emp.EmployeeID;
str = Emp.FirstName;
str = Emp.LastName;
str = Emp.City;
str = Emp.State;
str = Emp.Zip;
}
//---------------------------------------------------------
// tdsdEmployees Object - Loop through all of them
//---------------------------------------------------------
tdsEmployees.EmployeesRow row;
tdsEmployees tdsEmps = new tdsEmployees();
SQLConnection myConnection = new SQLConnection(Globals.DBConnString);
SQLDataSetCommand dsCommand = new SQLDataSetCommand("SELECT * FROM Employees", myConnection);
dsCommand.FillDataSet(tdsEmps, "Employees");
int iCount = tdsEmps.Employees.Count;
string str;
for (int i = 0; i < iCount; i++) {
row = tdsEmps.Employees[i];
str = row.EmployeeID;
str = row.FirstName;
str = row.LastName;
str = row.City;
str = row.State;
str = row.Zip;
}

The two mechanisms look very similar at first, but there are some very important differences.

The pro side of the tdsEmployees strongly typed class

The tdsEmployees class was generated in about a minute, and true to its name, it is strongly typed. The wizard asked for some database information, and then it created the tdsEmployees class with all of its get and set property accessors, i.e., FirstName, LastName, and so on.

Quite a few methods in the tdsEmployees class were also generated by the wizard for me based on the structure of the Employees database table. Here are just a few to give you a feel for what a wizard-generated strongly-typed DataSet contains:

public virtual void AddEmployeesRow(EmployeesRow rowEmployeesRow);
public virtual EmployeesRow FindByEmployeeID(string columnEmployeeID);
public virtual EmployeesRow NewEmployeesRow();
// And a ton of notifications such as
protected virtual void OnFirstNameColumnChanging(DataColumnChangeEventArgs e);
protected virtual void OnLastNameColumnChanging(DataColumnChangeEventArgs e);

Note that these methods are mostly virtual with some even being protected; deriving from

The tdsEmployees seems to be the intended path.

The con side of the tdsEmployees strongly typed class

The obvious difference at first glance is that using the tdsEmployees class requires my SQL query and setup work to live outside of the class. In other words, for tdsEmployees to become a real middle-tier object, we've got some work to do. As noted on the pro side, the derivation is the intended path to turn the tdsEmployees object into something more useful that could contain our business logic. So we would need to create a class that inherits from our tdsEmployees class (you can only inherit from one class).

Also, unlike the hierarchical set of objects that we created, the tdsEmployees is a collection only -- a collection of one, maybe, but still a collection. There is no single tdsEmployee object. There is an EmployeeRow class that inherits from the ADO.NET DataRow class, and it would be possible to derive an Employee object from EmployeeRow, but you would have some work to do because of the internal structure of the tdsEmployees class. Also, when you inherit from a class in C# it is always a public inheritance. Unlike in C++, in C# you cannot inherit using protected or private keywords. This means that you might need to hide and/or overload some methods to ensure a coherent interface.

Finally, tdsEmployees has no way to use a DataReader for fast performance when desired. You would either have to wedge it in (which would be extremely difficult) or use another object entirely. DataSets may use DataReaders underneath the covers, but for performance reasons, you want to skip the DataSet entirely and use a DataReader if you can for read-only data.

Performance

After moving the SQLData struct from concept to prototype I was curious about its performance as compared to a strongly typed dataset. I was also eager to know whether bypassing the DataSet and going directly to the DataReader would provide a worthwhile performance boost.

The first order of business was to write a Timer function to time our tests. Eric Gunnerson provided a good starting point in his book A Programmers Introduction to C# published by Apress. This book proved invaluable to me when it came to issues related to the C# language and certain system namespaces.

Once I had a generic timing method that I could pass a delegate (C++ programmers think function pointer) to, I was able to gather the necessary performance statistics.

The first test I ran accessed 2000 employee records looping through each employee and accessing all six of its fields in each object. I did this test in four different ways:

1) Using the tdsEmployees collection (wizard generated strongly typed dataset)

2) Using our Employees collection with RWMode.Read and ObjectCreation.Single

3) Using our Employees collection with RWMode.Read and ObjectCreation.Unique

4) Using our Employees collection with RWMode.Write and ObjectCreation.Unique

I've taken a little creative license with the following code to make it more readable. I think it is important to know how the timings were gathered (all of the real code is available at the end of this document).

using System.Diagnosticsusing system;
public delegate void TestRoutine(...);
public static void TimeFunction(TestRoutine func, int Iterations...) {
// NOTE: All throughout this timing routine I'm gathering statics,
// such as average, max, min, total time, the memory before
// and after garbage collection and even the time it took
// to do the garbage collection itself.
GarbageCollect();
for (int i = 0; i < Iterations; i++) {
// Call the TestRoutine
func();
}
// Time this too!
GarbageCollect();
Console.WriteLine( /* spit out the stats */ );
}
public static void RunTests() {
// Call each test routine once before starting the timings to eliminate first time load
ariations.
AlltdsEmployees();
AllEmployees(RWMode.Read, ObjectCreation.Single);
for (i = 0; i < 10; i++)
TimeFunction(new TestRoutine(AlltdsEmployees, 1));
for (i = 0; i < 10; i++)
TimeFunction(new TestRoutine(AllEmployees, 1), RWMode.Read, ObjectCreation.Single);
for (i = 0; i < 10; i++)
TimeFunction(new TestRoutine(AllEmployees, 1), RWMode.Read,
bjectCreation.Unique);
for (i = 0; i < 10; i++)
TimeFunction(new TestRoutine(AllEmployees, 1), RWMode.Write, ObjectCreation.Unique);
}

Let's look at the numbers.

The statistics were gathered on a 600 MHZ Dell Inspiron 3800 laptop running Windows 2000 Server Edition - SP1 with 256 meg of ram. The test code was housed in a C# console application with both the MiddleTier namespace and the console application compiled for Release mode.

The legend is as follows: T = Total time, M = Min time, X = Max time, A = Average time, and GC = the time it took to do a garbage collection after the test. All times are in seconds. In this particular test suite T, M, X, and A are all the same because I ran only a single iteration. I ran each test ten times in a row, however, to demonstrate the predictability. Notice the blip in the time on the third test suite. This would happen periodically due to disk access but could happen in any area. Still, the tests and times are very predictable and repeatable.

NOTE:

The observant reader will notice a direct correlation between the garbage collection time and the overall time.

If you look at the times for tdsEmployees, you'll notice that it took about 245 milliseconds or roughly a quarter of a second to retrieve and loop through all 2000 employees accessing all six fields. Also, the garbage collection time was roughly another .020 milliseconds after each test. Compare this with our read-only Employee collection using our ObjectCreation.Single performance hint. The Employee collection took about 100 milliseconds (one-tenth of a second) to do the same thing with only .006 milliseconds spent in garbage collection.

The Employee collection completed the task in only 40% of the time it took the tdsEmployees collection to do the same amount of work. Garbage collection also was helped by the ObjectCreation hint, going from .020 in the tdsEmployees test to .006 in the Employees collection.

You might be surprised to see that the Employees test using the Read/Unique combination took longer than the Write/Unique combination. After all, RWMode.Read uses the ADO.NET DataReader while RWMode.Write uses the ADO.NET DataSet. Arent DataReaders faster than DataSets? The reason for this is that in the Employees Read/Unique combination, each Employee object must copy the data out of the DataReader and into its own local buffers. DataReaders support the reading of a forward-only stream of records; multiple rows of data cannot be held onto simultaneously. Take a look at the way the Employee object handles the i_SetBookmark method.

void ISQLObject.i_SetBookmark(SQLData MyData) {
// Do a shallow copy of the SQLData struct
m_Data = MyData;
// If we're read only and they want Unique instances
// we must copy the data out of the SQLDataReader
if (m_Data.IsReadOnly && m_Data.UniqueInstances) {
CloneReadOnlyData();
}
}

Remember the golden rule: not doing something is faster than doing it, as revealed by the numbers in the Read/Unique Employees test. Always avoid copying data out of a buffer and into your own buffer when possible. The Read/Unique (DataReader) Employees test lost out to the Write/Unique (DataSet) combination although it still outperformed the tdsEmployees Dataset.

If you continue to examine the statistics, you'll see that all of the Employees collection tests outperformed the tdsEmployees collection. More important was the Employees test using the values RWMode.Write the ObjectCreation.Unique combination. This collection has all of the same capabilities of the tdsEmployees collection (it actually has more). If you look at the garbage collection times associated with the tdsEmpoyees test, you'll notice that they are quite a bit higher than the rest of the tests. I suspect after looking in detail at the tdsEmployees collection class that I know what is eating up the cycles, but I would hesitate to say until I know for sure.

How about a more realistic test? I mean, who retrieves 2000 employees per web hit anyway?

Here are the numbers for a test in which a single record was fetched from the employee table. In order to reach any measurable amount of time, I had the test routine retrieve a single employee record 3000 times accessing all six fields each time. Think of this test as 3000 individual hits against a web site.

The Employee numbers were based on a read-only (individual) Employee object (which uses the DataReader). When I ran the same test using a writeable Employee object, there was no difference between the times of the tdsEmployees collection and a writeable Employee object. So in 3000 web hits, I saved about six to seven seconds using a read-only Employee object.

Conclusion

The .NET Framework is a tremendously productive and rich environment. I created the entire code base for this article in about a week. ADO.NET is a significant departure from ADO and I recommend that you begin to study it seriously now before you're facing an actual .NET deadline. The DataReader object can yield a significant performance boost in certain situations as was demonstrated by the SQLData struct and the performance statistics presented in this article. Although allocating objects is very fast in the .NET Framework, not allocating them is even faster. The performance numbers and the time to clean up afterward are, not surprisingly, linked. The author is not unaware of the inadequacy of hierarchical middle-tier architecture in certain situations; sometimes database views must be created to retrieve data efficiently where a hierarchical model would prove too slow. This paper was meant to be a learning exercise -- it was not intended to present the SQLData struct as production code, although someone could attempt to do so. The lingering question this paper leaves unanswered is how the middle tier handles incoming XML. How do we trigger all of the business logic? Stay tuned.

The Source Code

The source code is available for viewing.

There are some areas of unfinished business in the code, however. For example, exception handling is weak. Also, our four business objects currently have a use once only policy. I did not add the ability to reuse them. For example, if you created an Employee object and then called its Load method followed by another call to Load or AddNew the results would be unpredictable. Correcting this shortcoming would not be difficult, however.