Count Duplicate Values in Excel Column Dynamically

Introduction

I have one Excel file to be generated from a database, it is available in the attachment with the name bbbb.xlsx.

Part of my project needed to detect duplicate rows in one of the columns named Root Cause with a dynamically generated percentage in the chart control.

So first we will convert the Excel data into a DataTable.

Here is the sample code: 
  1. protected void btnUpload_Click(object sender, EventArgs e)   
  2.  {   
  3.      if (FileUpload1.HasFile)   
  4.      {   
  5.          string FileName = Path.GetFileName(FileUpload1.PostedFile.FileName);   
  6.          string Extension = Path.GetExtension(FileUpload1.PostedFile.FileName);   
  7.          string FolderPath = ConfigurationManager.AppSettings["FolderPath"];   
  8.    
  9.          string FilePath = Server.MapPath(FolderPath + FileName);   
  10.          FileUpload1.SaveAs(FilePath);   
  11.          Import_To_Grid(FilePath, Extension, rbHDR.SelectedItem.Text);   
  12.      }   
  13.  }   
  14.  private void Import_To_Grid(string FilePath, string Extension, string isHDR)   
  15.  {   
  16.      string conStr = "";   
  17.      switch (Extension)   
  18.      {   
  19.          case ".xls"//Excel 97-03   
  20.          conStr = ConfigurationManager.ConnectionStrings["Excel03ConString"].ConnectionString;   
  21.          break;   
  22.          case ".xlsx"//Excel 07   
  23.          conStr = ConfigurationManager.ConnectionStrings["Excel07ConString"].ConnectionString;   
  24.          break;   
  25.      }   
  26.      conStr = String.Format(conStr, FilePath, isHDR);   
  27.      OleDbConnection connExcel = new OleDbConnection(conStr);   
  28.      OleDbCommand cmdExcel = new OleDbCommand();   
  29.      OleDbDataAdapter oda = new OleDbDataAdapter();   
  30.      DataTable dt = new DataTable();   
  31.      cmdExcel.Connection = connExcel;   
  32.    
  33.      //Get the name of First Sheet   
  34.      connExcel.Open();   
  35.      DataTable dtExcelSchema;   
  36.      dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);   
  37.      string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();   
  38.      connExcel.Close();   
  39.    
  40.      //Read Data from First Sheet   
  41.      connExcel.Open();   
  42.      cmdExcel.CommandText = "SELECT * From [" + SheetName + "]";   
  43.      oda.SelectCommand = cmdExcel;   
  44.      oda.Fill(dt);   
  45.      connExcel.Close();   
  46.  }  

The code above uses a File Upload Control to export an Excel sheet into a DataTable. If the file extension is .xls then it is treated as a (up to) 2003 version but if the file extension is .xlsx then it is treated as a 2007 (and above) version. So finally in this step we must send the data into the data table.

Now, by using this data table we will count the number of duplicate rows. For that purpose we will use the two methods LINQ and the Dictionary class. 

  1. var query = dt.AsEnumerable().GroupBy(r => new { Name = r.Field<string>("F13") }).Select(grp => new   
  2.  {   
  3.      Name = grp.Key.Name,   
  4.      count = grp.Count()   
  5.  });   
  6.    
  7.  for (int i = 0; i < query.Count(); i++)   
  8.  {   
  9.      var item = query.ElementAt(i);   
  10.      var itemname = item.Name;   
  11.      var itemcount = item.count;   
  12.    
  13.      Response.Write("Name : " + itemname + " - - - - - Count :" + itemcount+ " <br>");   
  14.      Response.Write("\n");   
  15.  }  
If you see in the code snippet above, using a LINQ query and r.Field<string>("F13") we specify the "Root Cause" column in the Excel file.
  1. Dictionary<stringint> ds = new Dictionary<stringint>();   
  2.  for (int i = 7; i < dt.Rows.Count; i++)   
  3.  {   
  4.      string s = dt.Rows[i][12].ToString();   
  5.      if (ds.ContainsKey(s))   
  6.      {   
  7.          ds[s] = ds[s] + 1;   
  8.      }   
  9.      else   
  10.      {   
  11.          ds.Add(s, 1);   
  12.      }   
  13.      // data.Add(s);   
  14.  }   
  15.    
  16.  for (int i = 0; i < ds.Count(); i++)   
  17.  {   
  18.      var item = ds.ElementAt(i);   
  19.      var itemname = item.Key;   
  20.      var itemcount = item.Value;   
  21.      Response.Write("Name : " + itemname + " - - - - - Count :" + itemcount + " <br>");   
  22.      Response.Write("\n");   
  23.  }   
  24.    
The code above uses the Dictionary class.

Ok. Now if you upload a dddd.xls file the output will be as in the following:

Name : Root cause - - - - - Count :1
Name : OS: Due to oversight or negligence - - - - - Count :10
Name : CODE: Errors in existing code - - - - - Count :13
Name : DA: Data unavailable - - - - - Count :4
Name : CODE: A calculation is incorrect, unclear, or incomplete, Memory Leak, Incomplete or missing code - - - - - Count :4
Name : TEST DATA: Non Availability of Test Data - - - - - Count :8
Name : CODE: An error related to performance/optimality of the code - - - - - Count :3
Name : CODE: Inadequate or incorrect or misleading or missing error messages in source code - - - - - Count :1
Name : DOC: All documentation related causes - - - - - Count :1
Name : MIS COMM: Defects due to Miscommunication - - - - - Count :1
Name : hanu - - - - - Count :2  

Has Header?

Yes No

Now, I want show above the count of the repeated data in the chart. So for this we will use a chart control, it is available in the Toolbox.

  1. System.Web.UI.DataVisualization.Charting.ChartArea charea=   Chart1.ChartAreas[0];    
  2. System.Web.UI.DataVisualization.Charting.DataPointCollection dpc= Chart1.Series[0].Points;    
  3. charea.AxisX.Title = "duplicate data";    
  4. charea.AxisY.Title = "no of duplicate data";    
  5. dpc.AddXY(item.Name, item.count);    

The code above should continue the LINQ query of the for loop in the Import_To_grid() method. Here the LINQ properties are item.Name and item.Count as the data source of the chart control.

Note: the LINQ or Dictionary and chart control code should be continued in the Import_To_Grid() method.

You can also download the total program from the attachment.

When you run this program you'll get the output as below: 



I hope it will be helpful for you. Please post your comments.

Thank you!