Eric Bergman-Terrell's Blog

Java Programming Tip: How to read a file and automatically specify the correct encoding

November 15, 2008

By default, Java reads text files using the default encoding. If you know a file's encoding, it can be specified in the constructor of the FileInputStream object that's used to read the file. The following class automates the process of reading files, using the correct encoding, provided the encoding of the file is either UTF-8, UTF-16 (big or little endian), or the system default.

package mainPackage;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;

public class FileUtils {
/***
 * Determines the encoding of the specified file. If a UTF16 Byte Order Mark (BOM) is found an encoding of "UTF16" is returned.
 * If a UTF8 BOM is found an encoding of "UTF8" is returned. Otherwise the default encoding is returned.
 * @param filePath file path
 * @return "UTF8", "UTF16", or default encoding.
 */
private static String getEncoding(String filePath) {
String encoding = System.getProperty("file.encoding");  

BufferedReader bufferedReader = null;

try {
// In order to read files with non-default encoding, specify an encoding in the FileInputStream constructor.
bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath)));

char buffer[] = new char[3];
int length = bufferedReader.read(buffer);

if (length >= 2) {
if ((buffer[0] == (char) 0xff && buffer[1] == (char) 0xfe) /* UTF-16, little endian */ || 
    (buffer[0] == (char) 0xfe && buffer[1] == (char) 0xff) /* UTF-16, big endian */) {
encoding = "UTF16";
}
}
if (length >= 3) {
if (buffer[0] == (char) 0xef && buffer[1] == (char) 0xbb && buffer[2] == (char) 0xbf) /* UTF-8 */  {
encoding = "UTF8";
}
}
}
catch (IOException ex) {
}
finally {
if (bufferedReader != null) {
try {
bufferedReader.close();
}
catch (IOException ex) {
}
}
}
 
return encoding;
}

/***
 * Returns the text of the specified file. If a Unicode Byte Order Mark (BOM) is found, the file is read with the corresponding encoding.
 * Otherwise the file is read using the default encoding.
 * @param filePath file path
 * @return text of file
 * @throws IOException
 */
public static String readFile(String filePath) throws IOException {
String encoding = getEncoding(filePath);

BufferedReader bufferedReader = null;

StringBuffer text = new StringBuffer();

try {
bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));

char[] buffer = new char[1024 * 16];
int length;

while ((length = bufferedReader.read(buffer)) != -1) {
text.append(buffer, 0, length);
}
}
finally {
if (bufferedReader != null) {
bufferedReader.close();
}
}

return text.toString();
}
}

Keywords: Java, file, encoding, Unicode, UTF8, UTF-8, UTF16, UTF-16, file I/O, FileInputStream, InputStreamReader, BufferedReader

Reader Comments

Name	Comment	URL	Date/Time
Victor	This is not helpful, because only very few files do have a BOM. And UTF-8 with BOM is discouraged. Better use a library like GuessEncoding or juniversalchardet.		October 22, 2013

Comment on this Blog Post

Title	Date
.NET Public-Key (Asymmetric) Cryptography Demo	July 20, 2025
Raspberry Pi 3B+ Photo Frame	June 17, 2025
EBTCalc (Android) Version 1.53 is now available	May 19, 2024
Vault 3 Security Enhancements	October 24, 2023
Vault 3 is now available for Apple OSX M2 Mac Computers!	September 18, 2023
Vault (for Desktop) Version 0.77 Released	March 26, 2023
EBTCalc (Android) Version 1.44 is now available	October 12, 2021

Eric Bergman-Terrell's Blog

Reader Comments

Recent Posts