FREE BOOK

Chapter 8: C# 4.0 Features

Posted by Addison Wesley Free Book | C# Language February 02, 2010
This chapter looks at the new features added into C# 4.0 that combine to improve code readability and extend your ability to leverage LINQ to Object queries over dynamic data sources.

Building the Microsoft Excel Row Iterator

To expose Microsoft Excel in a way that supports LINQ queries, an iterator must be built that internally reads data from an Excel spreadsheet and exposes this data as an IEnumerable collection, row by row. The skeleton of the Excel row iterator, without implementation is

public IEnumerable<List<dynamic>> GetExcelRowEnumerator(
    string fileName,
    int sheetIndex)
{
    // Declare an array to hold our values
    // Create the COM reference to Excel
    // Open the workbook by filename
    // Get the excel worksheet, 1 for the first, etc.
    // Find the used range in the sheet
    // Read in the value array for all used rows and columns
    // Close Excel
    // Build and yield each row at a time

}

This iterator declaration takes arguments of a fully qualified filename to an Excel spreadsheet and a worksheet by a one-based number and returns the values in each row (with each column's value as an item in an List<dynamic> collection) from the chosen worksheet in the selected Excel file. Excel allows multiple pages, called worksheets, in a single spreadsheet file. The full implementation of this algorithm is shown in Listing 8-7.

This implementation isn't the strategy to be used for extremely large spreadsheets because it buffers the entire dataset into an in-memory array with a single call and then builds the row results from this array of values. This technique, however, is the fastest way to access data from Excel using COM-Interop because it avoids single cell or row access and keeps Excel open (a large executable memory footprint) for as short a time as possible. If an application is required to read a massive spreadsheet of data, experiment with alternative value access strategies supported by Excel's extensive object model row by row perhaps to avoid completely loading the entire array into memory upfront. This implementation is fine in performance and memory usage for most purposes.

The using Declaration for the Following Examples

To avoid having to prefix all calls to the Interop library, I added the following using declaration at the top of my class file:

using Excel = Microsoft.Office.Interop.Excel;

This simplified the code declarations and allowed me to use Excel.Application, Excel.Workbook (and others) rather than Microsoft.Office.Interop.Excel.Application, Microsoft.Office.Interop.Excel.Workbook, and so on. See the example code posted on http://hookedonlinq.com/linqbook.ashx for the full source code.

Listing 8-7

Full code listing for an Excel row enumerator. Calling this method enumerates the values of a row in an Excel spreadsheet. Each row's column values are in a collection.

public IEnumerable<List<dynamic>> GetExcelRowEnumerator(
   string fileName,
   int sheetIndex)
{
   // declare an array to hold our values
   object[,] valueArray = null;
   // create the COM reference to Excel
   var excel = new Excel.Application();
   try
   {
       // open the workbook by filename
       Excel.Workbook workBook =
           excel.Workbooks.Open(fileName);
       if ( workBook != null &&
           sheetIndex < workBook.Sheets.Count )
       {
           // get the worksheet, 1 for the first, etc.
           Excel.Worksheet sheet =
               workBook.Sheets[sheetIndex];
           // find the used range in the sheet
           Excel.Range usedRange = sheet.UsedRange;
           // read in the value array, this is the fastest
           // way to get all values in one hit.
           valueArray = usedRange.get_Value(
               Excel.XlRangeValueDataType.xlRangeValueDefault);
       }
       workBook.Close(false, fileName);
   }
   finally
   {
       // finished with Excel now, close.
       excel.Quit();
   }
   // build and yield each row at a time
   for ( int rowIndex = 1;
       rowIndex <= valueArray.GetLength(0);
       rowIndex++)
       {
       List<dynamic> row =
           new List<dynamic>(
               valueArray.GetLength(1));
       // build a list of column values for the row
       for (int colIndex = 1;
           colIndex <= valueArray.GetLength(1);
           colIndex++)
           {
           row.Add(
               valueArray[rowIndex, colIndex]);
       }
       yield return row;
   }

}

Writing LINQ queries against Microsoft Excel data, like that shown in Figure 8-2, can be written in the code form shown in Listing 8-8. The output from this code is the subset of rows that have a State value of WA, which for the data shown are these three rows:

Customers in WA (3)
VALDES, Armando
KAGEL, Stewart
LARD, Chance

Listing 8-8

Example code for reading the data from an Excel spreadsheet and running a LINQ query over its contents

string filename = Path.Combine(
   Environment.CurrentDirectory, "Data/SampleExcel.xlsx");

const
int firstNameCol = 0;
const
int lastNameCol = 1;
const
int stateCol = 5;
var
q = from row in GetExcelRowEnumerator(filename, 1)
       where row[stateCol] == "WA"
       select row;

Console
.WriteLine("Customers in WA ({0})", q.Count());
foreach
(var row in q)
{
   Console.WriteLine("{0}, {1}",
       row[lastNameCol].ToUpper(), row[firstNameCol] );

}

The return type of the row values is declared as type List<dynamic>. It easily could have been declared as type List<object>. The downside of declaring the values as type of object rather than dynamic comes down to the ability to treat the value members as the underlying type. For example, in Listing 8-8 the statement row[lastNameCol].ToUpper()would fail if the element types were declared as object. Object doesn't have a method called ToUpper, even though the underlying type it is representing is a string. And to access that method a type cast needs to be added, bloating the code out to ((string)row[lastNameCol]).ToUpper(). Declaring the element type as dynamic in the collection allows the runtime to look up method names using reflection on the underlying type at runtime, however the particular type of that column value is declared as in Excel (in this case a string, but some columns are DateTime and double). The removal of the type casting when calling methods or properties on object types simplifies and improves code readability.

The GetExcelRowEnumerator method could be enhanced by combining the header row reading and accessibility in a similar fashion to that used by dynamic lookup in Listing 8-4, which would eliminate the need to hardcode the column index positions and allow the row data to be accessed by column header name using simple property access syntax.

Total Pages : 11 7891011

comments