Home  >  Article  >  Java  >  What to do if java io is garbled

What to do if java io is garbled

藏色散人
藏色散人Original
2020-05-21 10:52:022639browse

What to do if java io is garbled

What should I do if java io is garbled? IO and Chinese garbled problems in Java

This is my first time posting this kind of blog, so I have been struggling with how to start. Let’s just go straight to the topic and record some of my summary experiences. Let’s start with simple ones. I have just started learning, and I hope I can stick to it slowly and write better and more meaningful content in the future.

In fact, we don’t encounter many problems with Chinese garbled characters at work. That’s because the company has unified the encoding of files for the convenience of development. But I think it is necessary to understand a little bit about its principles.

Recommended: "java learning"

IO is the input and output stream. If we use object-oriented understanding, it is the input and output stream objects. Mainly used to operate file objects. So let’s talk a little bit about the concept of files, that is, File objects. In Java, File is not a specific file that we refer to in daily life, but a path object. For example, File file=new File("D:\\aaa"); This is a File object, maybe it represents It is a folder. Maybe the path does not exist, but this code indeed creates a File object representing the path. This way of writing is just not commonly used. Because we usually operate more on a text, picture, etc., such as File f=new File("aaa.txt");

The above briefly mentioned what IO and File are, and we will talk about it next. Talk about how these pictures, text, videos and other information are stored on our storage devices.

My personal understanding is that no matter what type of file it is, it is saved in binary form. The minimum unit is 1 byte, which is composed of 8 bits 01. So if we want to copy a file, we only need to operate the byte stream, that is, get all the bytes in a file and write them to another file. In fact, it is theoretically possible, but for characters Files of this type are rather special.

This is why there is a problem with Chinese garbled characters. Everyone is familiar with the ASCII code table, or at least heard of it. It should be regarded as a code table that appeared very early. At first, it was only used to represent 26 English letters and some special symbols (because computers only recognize binary, so it is necessary to Replace characters with corresponding bytes to form a code table).

But with the development of computers, ASCII may not be enough, and many countries may have their own encoding schemes, so different encoding tables have emerged. Common ones are GBK and UTF-8, and the default encoding used in jvm is unicode encoding, that is, 2 bytes represent a Chinese character, but UTF-8 is not necessarily the case. It may be 3 bytes to represent a Chinese character, or more. So a problem arises. The number and content of the corresponding byte codes for the same Chinese character in different code tables are different.

So how to solve it?

When we copy a picture from disk A to disk B, we only need to get all the bytes of A to B. But it is actually possible to operate a text in the same way, provided that the text encodings in A and B must be the same. Because there is no byte encoding problem in pictures. But what should I do if I want to transmit Chinese from the network or server? It certainly cannot be achieved by just bytes (because we can't manually change the encoding of the file if we encounter problems). Therefore, Java provides a character stream object, which adds encoding settings on the basis of the byte stream to solve the problem of garbled characters.

Without further ado, let’s use a few small cases to illustrate:

1. First, create aa.txt and bb.txt under the current project. Just write a few Chinese characters in aa. You will find that both methods can achieve

a, using the character stream

FileReader fr=new FileReader("aa.txt");
FileWriter fw=new FileWriter("bb.txt");
  int c;
  while((c=fr.read())!=-1){
   fw.write(c);
  }
  fr.close();
  fw.close();

b, and using the byte stream

FileInputStream fis=new FileInputStream("aa.txt");
FileOutputStream fos=new FileOutputStream("bb.txt");
  int b;
  while((b=fis.read())!=-1){
   fos.write(b);
  }
  fis.close();
  fos.close();

2. At this time, if the encoding method of aa is If it is UTF-8, then let's change the encoding of bb to GBK and see if we run the above two methods again, all the characters are garbled.

The reason is that due to the different encoding methods of the two files, the Chinese code tables are different, so the codes are garbled.

3, so when the encoding methods of the two files are different, we can specify the encoding corresponding to the file when reading and writing.

The implementation is as follows:

InputStreamReader isr=new InputStreamReader(new FileInputStream("aa.txt"),"utf-8");
OutputStreamWriter osw=new OutputStreamWriter(new FileOutputStream("bb.txt"),"gbk");
  char[] arr=new char[1024];
  int len;
  while((len=isr.read(arr))!=-1){
   String s=new String(arr,0,len);
   System.out.println(s);
   osw.write(s);
  }
  isr.close();
  osw.close();

Although the code is very simple, let's briefly explain it. From the API, we can find that InputStreamReader and OutputStreamWriter are both objects that operate on characters. Continue with Reader and Writer.

is mainly used to convert bytes into characters and characters into bytes. So it can also be found from the construction that what is passed in is a byte stream object. Use UTF-8 to read the byte stream and convert it into characters, and then convert the characters into gbk encoding and write them into bytes.

The following lines will not be explained, they are all basic methods. What is passed in the construction is the anonymous inner class object and the decorative design pattern. You can simply understand this writing method.

There are actually many useful classes for byte streams and character streams, such as BufferedInputStream, BufferedReader, etc., which I won’t go into details again.

Regarding the coding issues of jvm and system platform, I will not explain it here.

You can try it with String and observe the bytecode and encoding problems of the string during compilation and runtime.

The above is the detailed content of What to do if java io is garbled. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn