SMTP Internationalization

You can find many articles dedicated to C# SMTP implementation on this or other sites. I'm not going to stop on protocol implementation details but rather on the issue of sending e-mail in languages other than English (I'd use Russian in our scenario). English-only based e-mail messaging systems use 7-bit System.Text.Encoding.ASCII encoding when text has to be converted to sequence of bytes for network transmission. All such applications convert any non-English characters (hex codes 0x80-0xFF) into '?' meaning that there is no proper character representation.

Simple solution to this problem is to use System.Text.Encoding instance that corresponds to source text encoding scheme. Source character set would usually correspond to one set in Control Panel/Regional Settings:

win_charset.jpg

I use Russian as my default language, so that all Cyrillic characters appear properly inside text areas and on title bars. Apparently, there is an easy way to find out what default encoding scheme is used by Windows:

System.Text.Encoding sourceEncoding = System.Text.Encoding.Default;

A little test

Console.WriteLine( "Windows charset: " + sourceEncoding.HeaderName );
Console.WriteLine( "Windows code page: " + sourceEncoding.CodePage );

would reveal that we are on the right way:

> Windows charset: windows-1251
> Windows code page: 1251

Now e-mail can be properly encoded for transmission. We'd just need to add character set identifier to message header:

text.AppendFormat( "Content-Type: text/plain;\r\n\tcharset=\"{0:G}\"\r\n",     sourceEncoder.HeaderName  );

where text is a StringBuffer variable containing resulting text. Message body would be transmitted like this:

byte
[] data = sourceEncoding.GetBytes( text.ToString() );
smtpStream.Write( data, 0, data.Length );

That would be all but in real world not everything is that simple. By historical reasons Russian speaking countries use KOI-8 encoding as de-facto e-mail standard (not everyone is using Windows and accordingly code page 1251 might not be supported on some DOS or UNIX systems). That's why I set my default e-mail encoding in Outlook Express to KOI-8 (Options/Send), so I'd be able to chat with 'non-Windows' buddies:

oe_charset.jpg

Some investigation reveals that this value is also present in default encoding object:

Console.WriteLine( "E-Mail charset: " + sourceEncoding.BodyName );

> E-Mail charset: koi8-r

Luckily, there is a static function System.Text.Encoding.Convert() that can convert text from one encoding scheme to another. Here is a snippet of code that must be implemented before message is sent. Don't forget that resulting code page will be different now, so 'Content-Type' charset header must refer to sourceEncoding.BodyName.

Using System.Text;
// ............
Encoding srcEnc = Encoding.Default;
Encoding dstEnc;
// src & dst refer to same object if no intermediate conversion is required
if( srcEnc.HeaderName.Equals( srcEnc.BodyName ) )
dstEnc = srcEnc;
else
dstEnc = Encoding.GetEncoding( srcEnc.BodyName );
// ............
byte[] srcData = srcEnc.GetBytes( messageString );
byte[] dstData;
// see if we need to convert data
if( dstEnc != srcEnc )
dstData = Encoding.Convert( srcEnc, dstEnc, srcData );
else
dstData = srcData;
// write encoded data
smtpStream.Write( dstData, 0, dstData.Length );


Similar Articles