Eric Bergman-Terrell's Blog

.NET Programming Tip: How to Determine the Encoding of a Unicode File

October 4, 2010

The StreamReader class allows you to read in Unicode text from a file without having to worry about the precise encoding:

...
StreamReader SR = new StreamReader(FileName, true);

String Contents = SR.ReadToEnd();

SR.Close();
...

For example, the above code works for Unicode files having the following Encodings: Encoding.BigEndianUnicode, Encoding.Unicode, and Encoding.UTF8. It also works if the file is encoded in Encoding.ASCII format.

The file's encoding is automatically detected because the StreamReader constructor's second argument (detectEncodingFromByteOrderMarks) is true.

There's no problem reading in Unicode text using the StreamReader. The problem is writing updated text back to the file with the original Encoding intact. For example, if your program reads text in Encoding.BigEndianUnicode format, it should write it back in the same format.

Unfortunately the StreamReader object doesn't keep the original Encoding around for later use. Don't try to use the CurrentEncoding member, it's always Encoding.UTF8, regardless of the text file's actual Encoding. At least it always was when I experimented with it.

So how can you use a StreamWriter to write back text read from a StreamReader, with the original encoding intact? Use the following code to determine the file's original encoding, and specify that encoding in the StreamWriter's constructor.

Unicode files start with a two byte prefix called a BOM (Byte Order Mark) that identifies the exact Encoding of the file. GetFileEncoding() iterates through various Unicode Encoding values and compares the file's BOM with the current Encoding's BOM (returned by the GetPrefix() member). When a match is found, the corresponding Encoding value is returned. If no matches are found, the Encoding.Default value is returned.

public static Encoding GetFileEncoding(String FileName)

// Return the Encoding of a text file.  Return Encoding.Default if no Unicode
// BOM (byte order mark) is found.

{
    Encoding Result = null;

    FileInfo FI = new FileInfo(FileName);

    FileStream FS = null;

    try
        {
        FS = FI.OpenRead();

        Encoding[] UnicodeEncodings = { Encoding.BigEndianUnicode, Encoding.Unicode, Encoding.UTF8 };

        for (int i = 0; Result == null && i < UnicodeEncodings.Length; i++)
        {
            FS.Position = 0;

            byte[] Preamble = UnicodeEncodings[i].GetPreamble();

            bool PreamblesAreEqual = true;

            for (int j = 0; PreamblesAreEqual && j < Preamble.Length; j++)
            {
                PreamblesAreEqual = Preamble[j] == FS.ReadByte();
            }

            if (PreamblesAreEqual)
            {
                Result = UnicodeEncodings[i];
            }
        }
    }
    catch (System.IO.IOException)
    {
    }
    finally
    {
        if (FS != null)
        {
            FS.Close();
        }
    }

    if (Result == null)
    {
        Result = Encoding.Default;
    }

    return Result;
}

Keywords: Unicode, Encoding, StreamReader, StreamWriter, BOM, Byte Order Mark, BigEndianUnicode, Encoding.Default, Encoding.ASCII, Encoding.Default, Encoding.Unicode, GetPreamble

Reader Comments

Comment on this Blog Post

Title	Date
EBT Weather is now available for Windows and Linux	May 30, 2026
Node.js + Express: How to Block Requests by User-Agent Headers	January 7, 2026
Vault 3 is Now Available for Windows on ARM Machines!	December 13, 2025
Vault 3: How to Include Outline Text in Exported Photos	October 26, 2025
.NET Public-Key (Asymmetric) Cryptography Demo	July 20, 2025
Raspberry Pi 3B+ Photo Frame	June 17, 2025
EBTCalc (Android) Version 1.53 is now available	May 19, 2024

Eric Bergman-Terrell's Blog

Reader Comments

Recent Posts