Solution For Parsing an Excel File in C# and the Cells Seem to Get Cut Off at 255 Characters

Problem

An Excel sheet containing perhaps 10 rows and a couple of columns. In a couple of the rows, the length of the data is more than 255 characters. When I try to upload this file in an ASP.Net application using the traditional OLEDB provider I do not see all the data; in particular the cells are truncated to 255 characters when loaded into my dataset object.

Solution

Having done sleepless nights I have found some solutions to fix this issue, which are presented below.

Solution 1

If your application is only used on one computer then you can go directly to the following registry settings and change the TypeGuessRows value:

HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows

64 bit systems

HKLM\SOFTWARE\wow6432node\microsoft\jet\4.0\engines\excel\TypeGuessRows

By setting this value to zero, every line of your spreadsheet is scanned for type guessing, rather than the default of 8 lines. If any text field is longer than 255 characters are encountered, then those columns are deemed to be memo fields.

Note that you are still not 100% guaranteed to get the right data types, depending on your data.

Note also the HKLM scope of this key though; it will affect every OleDB Excel import by any process on that machine and this leads to a degradation of performance depending on the size of the data.

Solution 2

A second way to work around this problem (without modifying the registry) is to make sure that rows with fields, which have data 255 characters or greater, are present in the first 8 rows (the default value of TypeGuessRows is 8) of the source data file.

Solution 3

This is the recommended solution by me as there is no need to change any registry or take care to have those lengthy data to be in first 8 rows. Instead we have a tool called NPOI which can be download from npoi.codeplex.com.

Using this dll we can upload the spreadsheet without concern for data truncation and also it has many features like creating the spreadsheet on the fly including charts, reports etc.., for more information you can find it on the site npoi.codeplex.com.

Anyway, reading data using this NPOI is different from the traditional OLEDB provider. Please find the following method which will return a DataTable object by sending the File Path and the respective SheetName as input:

public static DataTable getExcelData(string FileName, string strSheetName)

    {

        DataTable dt = new DataTable();

        HSSFWorkbook hssfworkbook;

        using (FileStream file = new FileStream(FileName, FileMode.Open, FileAccess.Read))

        {

            hssfworkbook = new HSSFWorkbook(file);

        }

 

        ISheet sheet = hssfworkbook.GetSheet(strSheetName);

        System.Collections.IEnumerator rows = sheet.GetRowEnumerator();

       

        while (rows.MoveNext())

        {

            IRow row = (HSSFRow)rows.Current;

 

            if (dt.Columns.Count == 0)

            {

                for (int j = 0; j < row.LastCellNum; j++)

                {

                    dt.Columns.Add(row.GetCell(j).ToString());

                }

 

                continue;

            }

 

            DataRow dr = dt.NewRow();

            for (int i = 0; i < row.LastCellNum; i++)

            {

                ICell cell = row.GetCell(i);

 

                if (cell == null)

                {

                    dr[i] = null;

                }

                else

                {

                    dr[i] = cell.ToString();

                }

            }

            dt.Rows.Add(dr);

        }

 

        return dt;

    }

May be I presented the solutions too directly without adequate explanation or discussion but presently my motto is to provide you the reasonable and permanent solution for those suffering from a similar problem.

Hope this research and the code helps you a lot. If so then please drop a comment below for me..

Happy Coding.