ANTS Performance Profiler 6.0
Skip Navigation Links
C# Corner Home
Forum Home
Latest 50
Unanswered
Win Prizes
All Time Leaders
Jump to CategoryExpand Jump to Category
Login 
    Welcome Guest!
 Search Forum For :  
X
 Login
Please login to submit a new post, reply and edit exiting posts, see user profiles, and access more features. If you are not a registered member, Register here.
User Id / Email:
Password:  
Forgot Password | Forgot UserName
   Home » C# Language » C# - Remove duplicate lines from a text file
       
Author Reply
Mister Muv
posted 1 posts
since Dec 11, 2008 
from

 C# - Remove duplicate lines from a text file
  Posted on: 11 Dec 2008       

Hi,

I am trying to remove duplicate lines from a text file. To make things difficult the lines contain non unique timestamps but a unique reference number. Some of the duplicates amount to 10 lines whereas others can only be 2 lines.

1. Here are some examples of duplicates lines:<timestamp>,<reference>,<error message>

08:47:22,95847170050,Problem inputting data.
08:48:28,96672540040,More problems inputting data.
08:49:29,95847170050,Problem inputting data.
08:55:28,106622510040,Extra issues inputting data.
08:56:35,95847170050,Problem inputting data.
08:57:35,106622510040,Extra issues inputting data.
09:02:35,96672540040,More problems inputting data.
09:03:41,96672540040,More problems inputting data.
09:04:41,106622510040,Extra issues inputting data.

I want to delete all but KEEP the most recent duplicate line.

I am new to c#, I originally wrote a java program to do this but was told to rewrite in c#.

 

To assist here is the java code.

/*
Contents of the text file is read into an ArrayList (allData)
Unique reference values are then extracted from allData and populated into references (another ArrayList)
*/
 
static DateFormat df = new SimpleDateFormat("HH:mm:ss");
 
...
 
ArrayList latest = getLatestEntries(allData, references);
 
 
private static ArrayList getLatestEntries(ArrayList allData, ArrayList references) {
        // For each reference, save the latest entry.
        ArrayList list = new ArrayList();
        for(int i = 0; i < references.size(); i++) {
               String ref = references.get(i).toString();
               Date date = null;
               int maxValIndex = i;
               //System.out.printf("ref = %s%n", ref);
               for(int j = 0; j < allData.size(); j++) {
                       String next = allData.get(j).toString();
                       if(next.split(",")[1].equals(ref)) {
                               Date nextDate = parse(next.split(",")[0]);
                               if(date == null) {
                                      date = nextDate;
                                      maxValIndex = j;
                                      continue;
                               }
                               if(nextDate.compareTo(date) > 0) {
                                      date = nextDate;
                                      maxValIndex = j;
                               }
                       }
               }
               list.add(allData.get(maxValIndex));
        }
return list;
} // getLatestEntries
 
private static Date parse(String s) {
        try {
               return df.parse(s);
        } catch(ParseException e) {
               System.out.println("read error: " + e.getMessage());
               System.out.println("parse error: " + e.getMessage());
               return null;
        }
} //parse

 

I know the code will be more or less similar with some capitalisation changes and System.out.println to Console.WriteLine but I am struggling with the Date to DateTime conversion.

 

Can someone help?


Thank you in advance.

 

Deepak
posted  2 posts
since  Dec 11, 2008 
from 

Re: C# - Remove duplicate lines from a text file
  Posted on: 12 Dec 2008      0 0    
Hi,

 I found this problem to be very interesting and something worth solving using LINQ. I have posted the code here. http://www.onedotnetway.com/remove-duplicate-lines-from-a-text-file-using-linq/

Hope you'll find this helpful.

Regards,

Deepak
One .Net Way
       
Developer-Ready ASP.NET 2.0 Web Hosting with 3 MONTHS FREE
Now supporting .NET 3.0 Framework with Windows Workflow Foundation, Windows Communication Foundation (WCF), Windows Presentation Foundation (WPF), windows CardSpace (WCS)! Providing more flexibility for Developers with Web Services Support and a User/Permission Manger. Also supporting MS SQL 2005/2000 with Real-Time Backups, FREE Automated Attach .MDF Tool, FREE SQL Restore and Shrink SQL DB Tools, and SQL
Dynamic PDF
ceTE software specializes in components for dynamic PDF generation and manipulation. The DynamicPDF™ product line allows you to dynamically generate PDF documents, merge PDF documents and new content to existing PDF documents from within your applications. Visit DynamicPDF here
SQL and .NET performance profiling in one place
Investigate SQL and .NET code side-by-side with ANTS Performance Profiler 6, so you can see which is causing the problem without switching tools.
60 FREE UI Controls from DevExpress
Register for your FREE copy on over 60 free presentation controls from DevExpress - Absolutely Free-of-Charge without any royalties or distribution costs. Visit Devexpress.com/60 today. Free controls include advanced lists box, dropdown calendar, rich text edit, spin edit, tab control and so much more!

DevExpress engineers feature rich presentation controls and reporting tools for WinForms, ASP.NET, WPF, and Silverlight. Our technologies help you build your best, see complex software with greater clarity and deliver compelling business solutions for Windows and the web in the shortest possible time.
Introducing MaxV - one click. infinite control. Hyper-V Hosting from MaximumASP.
Finally – a virtual platform that delivers next-generation Windows Server 2008 Hyper-V virtualization technology from a managed hosting partner you can truly depend on. Visit www.maximumasp.com/max for a FREE 30 day trial. Hurry offer ends soon. Climb aboard the MaxV platform and take advantage of High Availability, Intelligent Monitoring, Recurrent Backups, and Scalability – with no hassle or hidden fees. As a managed hosting partner focused solely on Microsoft technologies since 2000, MaximumASP is uniquely qualified to provide the superior support that our business is built on. Unparalleled expertise with Microsoft technologies lead to working directly with Microsoft as first to offer IIS 7 and SQL 2008 betas in a hosted environment; partnering in the Go Live Program for Hyper-V; and product co-launches built on WS 2008 with Hyper-V technology.
Clickatell's SMS Gateway
Clickatell's Developer Solutions allow you to SMS enable any website or application via a range of API's. Learn More about our API connections.
Free access to .NET Memory Management video
Everything you need to know about Garbage Collection, Temporary Objects, Fragmentation, Finalization and common causes of memory leaks in .NET. Watch the video here.
Microsoft Visual Studio 2010
Visualize your workspace with new multiple monitor support, powerful Web development, new SharePoint support with tons of templates and Web parts, and more accurate targeting of any version of the .NET Framework. Get set to unleash your creativity.
Nevron Chart for .NET 2010.1 Now Available
The leading .NET charting control now features PDF, Flash and Silverlight export, visualization of large datasets and more. Deliver true charting functionality to your BI, Scorecard, Presentation or Scientific apps. Download evaluation now.
Unlimited Access to 10,000 Tech Books & Videos, 15 Days, $0
Unlimited Access to 10,000 Tech Books & Videos, 15 Days, $0
Top Microsoft Certification Books & Videos, 15 Days, $0
Top Microsoft Certification Books & Videos, 15 Days, $0
ANTS Performance Profiler 6.0
 Hosted by MaximumASP  |  Found a broken link?  |  Contact Us  |  Terms & conditions  |  Privacy Policy  |  Site Map  |  Suggest an Idea  |  Advertise with us
Current Version: 3.2009.8.27
 © 1999 - 2010  Mindcracker LLC. All Rights Reserved