File Hash Comparison With MD5 And SHA1

Introduction

In general, If we need to detect changes in file system or directory of files, we generally use file system watcher provided in .NET. However, after learning its side effects, it seems that it is just a suggestive class which does not have any real benefits as such. Another reason not to use file system watcher class is that it generally doesn’t care about file content and it takes care of the file system in general. So I have found hashing a better way.

Background

In this article I will try to answer a common question programmers ask about hashings; i.e., what time it will take to compute hash of files in my directory, what if I have sub folders in parent folder? Will it be fast enough for normal application deployment file structures which have a few Mbs of files? To answer these questions I wrote a small utility and ran it on my file structure with around 45 files having a few Mbs of the size of the whole directory. And the result was fast enough. It took only 50-60 milliseconds to compute hash and the same time it took to validate the hash.

Using the code

Please observe the below code file. I tried computing hash in both MD5 and SHA1 hash algos. Both algorithms take the same time to hash file content. Please note we are here hashing actual file content. If there would be any change in file content, even a new space or a character, the hash of the whole file will be changed. However it is also important to note that any change in file attributes like last file modification time etc. won’t affect hash result.

Hide Shrink Copy Code

  1. public class DeploymentFile  
  2. {  
  3.     public string FilePath  
  4.     {  
  5.         get;  
  6.         set;  
  7.     }  
  8.     public bool IsFilePathValid   
  9.     {  
  10.         get;  
  11.         set;  
  12.     }  
  13.     public string HashedValue   
  14.     {  
  15.         get;  
  16.         set;  
  17.     }  
  18.     public bool IsFileModified   
  19.     {  
  20.         get;  
  21.         set;  
  22.     }  
  23.   
  24.     public DeploymentFile(string filePath)  
  25.     {  
  26.         FilePath = filePath;  
  27.         IsFilePathValid = true;  
  28.         IsFileModified = false;  
  29.         if (File.Exists(filePath))  
  30.             HashedValue = ComputeHashSHA(filePath);  
  31.         else  
  32.             IsFilePathValid = false;  
  33.     }  
  34.   
  35.     public bool IsExist(string FilePath)   
  36.     {  
  37.         return File.Exists(FilePath);  
  38.     }  
  39.   
  40.     //public string ComputeHashMD5(string filename)  
  41.     //{  
  42.     // using (var md5 = MD5.Create())  
  43.     // {  
  44.     // using (var stream = File.OpenRead(filename))  
  45.     // {  
  46.     // return (Encoding.Default.GetString(md5.ComputeHash(stream)));  
  47.     // }  
  48.     // }  
  49.     //}  
  50.   
  51.     public string ComputeHashSHA(string filename)   
  52.     {  
  53.         using(var sha = SHA1.Create())  
  54.         {  
  55.             using(var stream = File.OpenRead(filename))  
  56.             {  
  57.                 return (Encoding.Default.GetString(sha.ComputeHash(stream)));  
  58.             }  
  59.         }  
  60.     }  
  61. }  
Shown below is the code for the form which displays all controls. You may observe that I am using a stopwatch to measure the time taken for the whole process of computation of the hash.
 
Important: Please note that if message box appears the stopwatch measures all time while the user clicks and closes the message box. So to measure accurately one may disable the message box.

Hide Shrink Copy Code
  1. public partialclass FileValidator: Form  
  2. {  
  3.     public FileValidator()   
  4.     {  
  5.         InitializeComponent();  
  6.     }  
  7.     List < DeploymentFile > DeployList;  
  8.     List < DeploymentFile > ValidationList;  
  9.     String filePath;  
  10.  
  11.  
  12.     #region ComputeHash  
  13.     private void ComputeHash_Click(object sender, EventArgs e)  
  14.     {  
  15.         DeployList = new List < DeploymentFile > ();  
  16.         foreach(var item in GetListOfFilesInDeployFolder())  
  17.         DeployList.Add(new DeploymentFile(item));  
  18.         FilesGrid.DataSource = DeployList;  
  19.     }  
  20.  
  21.  
  22.     #endregion ComputeHash  
  23.  
  24.     #region ValidateFileHash  
  25.     private void ValidateHash_Click(object sender, EventArgs e)  
  26.     {  
  27.   
  28.         Stopwatch stopwatch = new Stopwatch();  
  29.         // Begin timing.  
  30.         stopwatch.Start();  
  31.         bool Abort = false;  
  32.         List < string > filesList = GetListOfFilesInDeployFolder();  
  33.         ValidationList = new List < DeploymentFile > ();  
  34.         foreach(var item in DeployList)  
  35.         ValidationList.Add(new DeploymentFile(item.FilePath));  
  36.   
  37.         //If new files are not added or deleted  
  38.         for (int i = 0; i < ValidationList.Count; i++)   
  39.         {  
  40.             if (ValidationList.Count != filesList.Count) Abort = true;  
  41.             if (ValidationList[i].FilePath != filesList[i]) Abort = true;  
  42.         }  
  43.         //if all files are valid and exists in directory  
  44.         if (!Abort && ValidationList.Exists((x) => x.IsFilePathValid == false))  
  45.             Abort = true;  
  46.   
  47.         if (Abort)   
  48.         {  
  49.             //disable message box to calculate accrate execution time through stop watch.  
  50.             MessageBox.Show("Files/Folder structure changed or modified since last check");  
  51.         }  
  52.   
  53.         if (!Abort)  
  54.         {  
  55.             for (int i = 0; i < ValidationList.Count; i++)  
  56.                 if (ValidationList[i].HashedValue != DeployList[i].HashedValue)   
  57.                 {  
  58.                     ValidationList[i].IsFileModified = true;  
  59.                     Abort = true;  
  60.                 }  
  61.         }  
  62.   
  63.         FilesGrid.DataSource = ValidationList;  
  64.   
  65.         stopwatch.Stop();  
  66.         label1.Text = "Time taken in Validation : " + stopwatch.Elapsed;  
  67.   
  68.     }  
  69.  
  70.     #endregion  
  71.   
  72.   
  73.     private List < string > GetListOfFilesInDeployFolder()  
  74.     {  
  75.         filePath = textBox1.Text;  
  76.         return Directory.GetFiles(@filePath, "*", SearchOption.AllDirectories).ToList();  
  77.     }  
  78.   
  79.   
  80.     private void FileValidator_Load(object sender, EventArgs e)  
  81.     {  
  82.         FilesGrid.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.DisplayedCells;  
  83.     }  
  84.   
  85.   
  86.   
  87. }  
validator

The above screenshot displays the time taken for computation and validation of hash. If there will be some file modification in between the compute hash button click and check for modifications button click, then those modifications will display up in IsFilemodified column. I am also recording the file structure and comparing it with file structure, any change in file path will be shown in IsFilePathValid column.

Points of Interest

It is interesting to find out that SHA1 and MD5 algorithm takes a similar time for fewer files. If the file count increases and the file size increases MD5 algorithms are more efficient that SHA1. However SHA1 is more trusted in developer circles. I think MD5 is better because we are not really challenging security here, we are more concerned about integrity of file content. Below are shown some of the comparisons of hash algorithms.

speed

Read more articles on MD5:


Similar Articles