Package groovy.util
Class CharsetToolkit
- java.lang.Object
-
- groovy.util.CharsetToolkit
-
public class CharsetToolkit extends java.lang.ObjectUtility class to guess the encoding of a given text file.Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered.
A byte buffer of 4KB is used to be able to guess the encoding.
Usage:
CharsetToolkit toolkit = new CharsetToolkit(file); // guess the encoding Charset guessedCharset = toolkit.getCharset(); // create a reader with the correct charset BufferedReader reader = toolkit.getReader(); // read the file content String line; while ((line = br.readLine())!= null) { System.out.println(line); }
-
-
Constructor Summary
Constructors Constructor Description CharsetToolkit(java.io.File file)Constructor of theCharsetToolkitutility class.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static java.nio.charset.Charset[]getAvailableCharsets()Retrieves all the availableCharsets on the platform, among which the defaultcharset.java.nio.charset.CharsetgetCharset()java.nio.charset.CharsetgetDefaultCharset()Retrieves the default Charsetstatic java.nio.charset.CharsetgetDefaultSystemCharset()Retrieve the default charset of the system.booleangetEnforce8Bit()Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.java.io.BufferedReadergetReader()Gets aBufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default charset if an 8-bitCharsetis encountered.booleanhasUTF16BEBom()Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).booleanhasUTF16LEBom()Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).booleanhasUTF8Bom()Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).voidsetDefaultCharset(java.nio.charset.Charset defaultCharset)Defines the defaultCharsetused in case the buffer represents an 8-bitCharset.voidsetEnforce8Bit(boolean enforce)If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII.
-
-
-
Method Detail
-
setDefaultCharset
public void setDefaultCharset(java.nio.charset.Charset defaultCharset)
Defines the defaultCharsetused in case the buffer represents an 8-bitCharset.- Parameters:
defaultCharset- the defaultCharsetto be returned if an 8-bitCharsetis encountered.
-
getCharset
public java.nio.charset.Charset getCharset()
-
setEnforce8Bit
public void setEnforce8Bit(boolean enforce)
If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. It might be a file without any special character in the range 128-255, but that may be or become a file encoded with the defaultcharsetrather than US-ASCII.- Parameters:
enforce- a boolean specifying the use or not of US-ASCII.
-
getEnforce8Bit
public boolean getEnforce8Bit()
Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.- Returns:
- a boolean representing the flag of use of US-ASCII.
-
getDefaultCharset
public java.nio.charset.Charset getDefaultCharset()
Retrieves the default Charset
-
getDefaultSystemCharset
public static java.nio.charset.Charset getDefaultSystemCharset()
Retrieve the default charset of the system.- Returns:
- the default
Charset.
-
hasUTF8Bom
public boolean hasUTF8Bom()
Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).- Returns:
- true if the buffer has a BOM for UTF8.
-
hasUTF16LEBom
public boolean hasUTF16LEBom()
Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).- Returns:
- true if the buffer has a BOM for UTF-16 Low Endian.
-
hasUTF16BEBom
public boolean hasUTF16BEBom()
Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).- Returns:
- true if the buffer has a BOM for UTF-16 Big Endian.
-
getReader
public java.io.BufferedReader getReader() throws java.io.FileNotFoundExceptionGets aBufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default charset if an 8-bitCharsetis encountered.- Returns:
- a
BufferedReader - Throws:
java.io.FileNotFoundException- if the file is not found.
-
getAvailableCharsets
public static java.nio.charset.Charset[] getAvailableCharsets()
Retrieves all the availableCharsets on the platform, among which the defaultcharset.- Returns:
- an array of
Charsets.
-
-