Search for repeated character combinations algorithm

Sep 14 2009 1:44 PM
Hi everybody,

I have an tool that reads an file. That file has for example the follow content:

wvtnfhxyz1hdxyz1fdxyz1ejxyz1dhxyz1dxyz1eeaa1oeys

I would like to search for repeated character combinations like the bold characters. The character combination xyz1 repeated 6 times.

I would like to return some top 5 with the most repeated character combinations. Something like this:

xyz1 : 6 times
aa5 : 5 times
ab4 : 3 times
ab : 2 times
66 : 2 times

This is the code that reads the file:


string path = @openFileDialog1.FileName;
try
{
// Open the stream and read it back.
using (FileStream fs = File.OpenRead(path))
{
byte[] b = new byte[1024];
//UTF8Encoding temp = new UTF8Encoding(true);
UTF7Encoding temp = new UTF7Encoding(true);

while (fs.Read(b, 0, b.Length) > 0)
{
textBox1.Text += temp.GetString(b);

}
}
}



Thanks!

Answers (2)