Jayanta Mukherjee

Jayanta Mukherjee

  • 1.2k
  • 120
  • 38.5k

Remove duplicate rows from excel using spire.xls & c#

May 22 2021 1:39 AM
Hi all,
 
I'm trying to remove the duplicate rows from excel files in the fastest way possible.
 
I initially tried the below code (but it's very slow)
  1. private void RemoveDuplicate(Worksheet sheet)  
  2.         {  
  3.   
  4.             for (int i = 0; i < sheet.Columns[0].Count - 1; i++)  
  5.             {  
  6.                 for (int j = i + 1; j < sheet.Columns[0].Count; j++)  
  7.                 {  
  8.                     if (sheet.Columns[0].CellList[i].Value == sheet.Columns[0].CellList[j].Value)  
  9.                     {  
  10.                         for (int col = 0; col < sheet.Rows[0].Count; col++)  
  11.                         {  
  12.                             if (col == sheet.Rows[0].Count - 1 && sheet.Rows[i].CellList[col].Value == sheet.Rows[j].CellList[col].Value)  
  13.                             {  
  14.                                 sheet.DeleteRow(j + 1);  
  15.                                 j--;  
  16.                             }  
  17.                             else if (col != sheet.Rows[0].Count - 1 && sheet.Rows[i].CellList[col].Value == sheet.Rows[j].CellList[col].Value)  
  18.                             {  
  19.                                 continue;  
  20.                             }  
  21.                             else  
  22.                             {  
  23.                                 break;  
  24.                             }  
  25.                         }  
  26.                     }  
  27.                 }  
  28.   
  29.             }  
  30.         }  
 Then using it like
  1. testWorkbook.LoadFromFile(filePath);  
  2. testSheet = testWorkbook.Worksheets[0];  
  3.   
  4. RemoveDuplicate(testSheet);  
 Seeing it's slowness I tried a different approach (show below), but it is not working:
  1. private void RemoveDuplicate2(Worksheet sheet, CellRange cellRanges)  
  2.         {  
  3.             // get the duplicated row numbers  
  4.             var duplicatedRows = cellRanges.Rows  
  5.                 .GroupBy(x=> x.Columns[0].Value)  
  6.                 .Where(x=> x.Count() > 1)  
  7.                 .Select(x=> x.Last().Columns[0].Row)  
  8.                 .ToList();  
  9.               
  10.             foreach (var element in duplicatedRows)  
  11.             {  
  12.                 sheet.DeleteRow(element);  
  13.             }  
  14.               
  15.         }  
  Then using it like
  1. testWorkbook.LoadFromFile(filePath);  
  2. testSheet = testWorkbook.Worksheets[0];  
  3.   
  4.   
  5. var cellRanges = testSheet.Range["A2:G" + testSheet.Columns[0].Count];  
  6.   
  7. RemoveDuplicate2(testSheet, cellRanges);  
Please help me on this ... 

Answers (3)