Eric Bergman-Terrell's Blog

.NET Programming Tip: How to Determine the Encoding of a Unicode File
October 4, 2010

The StreamReader class allows you to read in Unicode text from a file without having to worry about the precise encoding:

...
StreamReader SR = new StreamReader(FileName, true);

String Contents = SR.ReadToEnd();

SR.Close();
...

For example, the above code works for Unicode files having the following Encodings: Encoding.BigEndianUnicode, Encoding.Unicode, and Encoding.UTF8. It also works if the file is encoded in Encoding.ASCII format.

The file's encoding is automatically detected because the StreamReader constructor's second argument (detectEncodingFromByteOrderMarks) is true.

There's no problem reading in Unicode text using the StreamReader. The problem is writing updated text back to the file with the original Encoding intact. For example, if your program reads text in Encoding.BigEndianUnicode format, it should write it back in the same format.

Unfortunately the StreamReader object doesn't keep the original Encoding around for later use. Don't try to use the CurrentEncoding member, it's always Encoding.UTF8, regardless of the text file's actual Encoding. At least it always was when I experimented with it.

So how can you use a StreamWriter to write back text read from a StreamReader, with the original encoding intact? Use the following code to determine the file's original encoding, and specify that encoding in the StreamWriter's constructor.

Unicode files start with a two byte prefix called a BOM (Byte Order Mark) that identifies the exact Encoding of the file. GetFileEncoding() iterates through various Unicode Encoding values and compares the file's BOM with the current Encoding's BOM (returned by the GetPrefix() member). When a match is found, the corresponding Encoding value is returned. If no matches are found, the Encoding.Default value is returned.

public static Encoding GetFileEncoding(String FileName)

// Return the Encoding of a text file.  Return Encoding.Default if no Unicode
// BOM (byte order mark) is found.

{
    Encoding Result = null;

    FileInfo FI = new FileInfo(FileName);

    FileStream FS = null;

    try
        {
        FS = FI.OpenRead();

        Encoding[] UnicodeEncodings = { Encoding.BigEndianUnicode, Encoding.Unicode, Encoding.UTF8 };

        for (int i = 0; Result == null && i < UnicodeEncodings.Length; i++)
        {
            FS.Position = 0;

            byte[] Preamble = UnicodeEncodings[i].GetPreamble();

            bool PreamblesAreEqual = true;

            for (int j = 0; PreamblesAreEqual && j < Preamble.Length; j++)
            {
                PreamblesAreEqual = Preamble[j] == FS.ReadByte();
            }

            if (PreamblesAreEqual)
            {
                Result = UnicodeEncodings[i];
            }
        }
    }
    catch (System.IO.IOException)
    {
    }
    finally
    {
        if (FS != null)
        {
            FS.Close();
        }
    }

    if (Result == null)
    {
        Result = Encoding.Default;
    }

    return Result;
}
Keywords: Unicode, Encoding, StreamReader, StreamWriter, BOM, Byte Order Mark, BigEndianUnicode, Encoding.Default, Encoding.ASCII, Encoding.Default, Encoding.Unicode, GetPreamble

Reader Comments

Comment on this Blog Post

Recent Posts

TitleDate
.NET Public-Key (Asymmetric) Cryptography DemoJuly 20, 2025
Raspberry Pi 3B+ Photo FrameJune 17, 2025
EBTCalc (Android) Version 1.53 is now availableMay 19, 2024
Vault 3 Security EnhancementsOctober 24, 2023
Vault 3 is now available for Apple OSX M2 Mac Computers!September 18, 2023
Vault (for Desktop) Version 0.77 ReleasedMarch 26, 2023
EBTCalc (Android) Version 1.44 is now availableOctober 12, 2021