Home  >  Article  >  Java  >  Detailed introduction to the difference between character stream and byte stream in Java

Detailed introduction to the difference between character stream and byte stream in Java

黄舟
黄舟Original
2017-03-25 10:33:322252browse

This article mainly introduces relevant information that explains the difference between character stream and byte stream in Java. Friends in need can refer to

The difference between character stream and byte stream in Java

1. What is a stream

A stream in Java is an abstraction of a byte sequence. We can imagine a water pipe, but now it flows in the water pipe What's in it is no longer water, but a sequence of bytes. Like water flow, a stream in Java also has a "flow direction". The object from which a byte sequence can usually be read is called an input stream; a byte sequence can be written to it. The object is called an output stream.

2. Byte stream

The most basic unit of byte stream processing in Java is a single byte, which is usually used to process binary data. The two most basic byte stream classes in Java are InputStream and OutputStream, which represent the basic input byte stream and output byte stream respectively. Both the InputStream class and the OutputStream class are abstract classes. In actual use, we usually use a series of their subclasses provided in the Javaclass library. Let's take the InputStream class as an example to introduce the byte stream in Java

The InputStream class defines a basic method read for reading bytes from the byte stream. The definition of this method As follows:

public abstract int read() throws IOException;

This is an abstract method, which means that any input byte stream class derived from InputStream needs to implement this method. The function of this method is to read a word from the byte stream section, if the end is reached, -1 is returned, otherwise the read bytes are returned. What we need to note about this method is that it will block until it returns a read byte or -1. In addition, byte streams do not support caching by default, which means that each time the read method is called, the operating system is requested to read a byte, which is often accompanied by a disk IO, so the efficiency is relatively low. Some friends may think that the overloaded method of read in the InputStream class that takes a byte array as a parameter can read multiple bytes at a time without frequent disk IO. So is this really the case? Let’s take a look at the source code of this method:

public int read(byte b[]) throws IOException {
  return read(b, 0, b.length);
}

It calls another version of the read overload method, so let’s follow up:

   public int read(byte b[], int off, int len) throws IOException {
    if (b == null) {
      throw new NullPointerException();
    } else if (off < 0 || len < 0 || len > b.length - off) {
      throw new IndexOutOfBoundsException();
    } else if (len == 0) {
      return 0;
    }

    int c = read();
    if (c == -1) {
      return -1;
    }
    b[off] = (byte)c;

    int i = 1;
    try {
      for (; i < len ; i++) {
        c = read();
        if (c == -1) {
          break;
        }
        b[off + i] = (byte)c;
      }
    } catch (IOException ee) {
    }
    return i;
  }

From the above code we can see Yes, in fact, the read(byte[]) method internally reads a byte array "at a time" by calling the read() method in a loop, so essentially this method does not use the memory buffer. To use a memory buffer to improve reading efficiency, we should use BufferedInputStream.

3. Character stream

The most basic unit of character stream processing in Java is the Unicode code unit (size 2 bytes), which is usually used to process text data. The so-called Unicode code element is a Unicode code unit, ranging from 0x0000~0xFFFF. Each number in the above range corresponds to a character. The String type in Java encodes the characters in Unicode rules by default and then stores them in memory. However, unlike storage in memory, data stored on disk usually has various encoding methods. Using different encoding methods, the same characters will have different binary representations. In fact, the character stream works like this:

  1. Output character stream: Convert the character sequence to be written to the file (actually a Unicode code element sequence) into characters in the specified encoding method byte sequence, and then write it to the file;

  2. Input charactersStream: decode the byte sequence to be read into the corresponding character sequence according to the specified encoding method (actually a sequence of Unicode code elements) so that it can be stored in memory.

We use a demo to deepen our understanding of this process. The sample code is as follows:

import java.io.FileWriter;
import java.io.IOException;


public class FileWriterDemo {
  public static void main(String[] args) {
    FileWriter fileWriter = null;
    try {
      try {
        fileWriter = new FileWriter("demo.txt");
        fileWriter.write("demo");
      } finally {
        fileWriter.close();
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

In the above code, we use FileWriter to write to demo.txt The four characters "demo" are written in, we use the hexadecimal editor WinHex to view the contents of demo.txt:

As can be seen from the picture above, the "demo" we wrote was encoded as "64 65 6D 6F", but we did not explicitly specify the encoding method in the above code. In fact, when we did not specify it, the encoding method was used The default character encoding of the operating system is used to encode the characters we want to write.

Since the character stream actually needs to complete the conversion of the Unicode code element sequence into the byte sequence of the corresponding encoding method before outputting it, it will use the memory buffer to store the converted byte sequence and wait. After the conversion is completed, they are written to the disk file together.

4. The difference between character stream and byte stream

After the above description, we can know that the main differences between byte stream and character stream are reflected in the following aspects:

  • The basic unit of byte stream operation is word section; the basic unit of character stream operations is the Unicode code element.

  • Byte streams do not use buffers by default; character streams use buffers.

  • Byte stream is usually used to process binary data. In fact, it can process any type of data, but it does not support direct writing or reading of Unicode code elements; character stream usually processes text. Data, which supports writing and reading Unicode code elements.

The above is the detailed content of Detailed introduction to the difference between character stream and byte stream in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn