I recently encountered a problem. When Java reads text files (such as csv files, txt files, etc.), it becomes garbled when encountering Chinese characters. (Recommendation: java video tutorial)
Read the code as follows:
List<String> lines=new ArrayList<String>(); BufferedReader br = new BufferedReader(new FileReader(fileName)); String line = null; while ((line = br.readLine()) != null) { lines.add(line); } br.close();
Principle
Java's I/O class processing is as shown in the figure:
The Reader class is the parent class for reading characters in Java's I/O, and the InputStream class is the parent class for reading bytes. The InputStreamReader class is the bridge that associates bytes to characters. It is responsible for processing during the I/O process. The conversion of reading bytes into characters, and the specific decoding of bytes into characters is implemented by StreamDecoder.
The Charset encoding format must be specified by the user during the StreamDecoder decoding process. It is worth noting that if you do not specify Charset, the default character set in the local environment will be used. For example, in the Chinese environment, GBK encoding will be used.
Summary: When Java reads the data stream, you must specify the encoding method of the data stream, otherwise the default character set in the local environment will be used.
After the above analysis, the modified code is as follows:
List<String> lines=new ArrayList<String>(); BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"UTF-8")); String line = null; while ((line = br.readLine()) != null) { lines.add(line); } br.close();
For more java knowledge, please pay attention to the java basic tutorial column.
The above is the detailed content of Reasons and solutions for reading Chinese garbled characters in Java files. For more information, please follow other related articles on the PHP Chinese website!