Reason For File Size Mismatch In FTP And Local Machine

I always prefer to use binary mode in FTP transfers wherever possible. It guarantees that you copy all the bytes of the files and save you from any image file corruption. Generally image files like JPEG, TIFF and so on become corrupted when they are transferred in text mode.
This is the second article of the Troubleshooting series as in the following:
  1. Tourbleshooting with log4net RollingFileAppender 
  2. Current article, you are here
  3. Who eats FTP file bytes silently when transferred in binary mode

Why the File Size varies between a FTP and local machine

Well, someone was just analyzing the file size on disk. Size-on-disk is something different than the contents. To understand this you must understand the operating system file system management. Whenever disk space requirements are allocated by the OS to store the contents of a file, a fixed size of area is allocated. This fixed area is known as a cluster and varies among file systems. Microsoft itself uses various flavours of file systems, like FAT16, FAT32, NTFS and so on (depending on the OS). And this is easily understood and explains why the file size on disk may be differ. No harm in it. Sound good?

When there is a file size mismatch rather than disk-size

In that case, you really need to troubleshoot. Always remember that you need to verify the size in bytes and in KB or MB. If there is still a mismatch of  bytes then first zeroed on the source file. Chances are that your source file is corrupted. You may argue that it is not because you are able to open it at your local machine. You may be puzzled to see that locally (in the source machine) you are able to see all the contents of your text file, but not in the remote machine because it has been silently and partially posted/copied/uploaded. Missing bytes/data of your text files? It is a surprising thing, no?

Causes / Finding(s)

  • Text file transfer (to FTP) may be done in other than ASCII/Text format

    It is always adviseable to use binary mode transfer to make an exact byte copy. But exceptions can always occur. This may behave strange for EOL notation character(s). Some operating systems only use a carriage return to indicate a new line. A few operating systems use a line feed character to indicate a new line. The Microsoft operating systems uses both, a carriage return and a line feed for each new line. So, when a file is transferred in binary mode, this conversion is not done and the result is junk characters, loss of characters or even a corrupt transfer. So, it is a good idea to have a switcher in code that will change the transfer-mode from binary to ASCII/text for files like txt, htm, CSS and so on. Note that this may also provide you some exceptions in some cases, because now a days text files are rich and use 2 bytes for a single character (UTF files) to support other languages like Chinese and so on.
  • Your source file itself has been copied from another remote machine (using MSRTC/RDP)

    In this case, text files are more likely the victim of corruption. It is always adviseable to download your files by zipping them on the server and then copying them. After downloading in Zip format, unzip them in their actual form. In this way you can at least minimize the risk of corruption.
  • Your system may have a proxy firewall that is interrupting the transfer
Thanks for reading.