Home  >  Article  >  Java  >  Java character stream example analysis

Java character stream example analysis

王林
王林forward
2023-04-28 16:40:071048browse

    1. The origin of character stream

    Since it is not very convenient to use byte stream to control Chinese, Java provides character stream to control Chinese

    Implementation principle: byte stream encoding table

    Why is there no problem when using byte stream to copy text files with Chinese characters?

    Because the underlying operation will automatically splice bytes into Chinese

    How to identify that the byte is Chinese?

    When Chinese characters are stored, whether it is UTF-8 or GBK, the first byte is a negative number to prompt

    2. Coding table

    Character set:

    is a collection of all characters supported by the system, including national characters, punctuation marks, graphic symbols, numbers, etc.

    To accurately store and recognize various character set symbols, a computer needs to perform character processing Encoding, a set of character sets must have at least one set of character encodings

    Common character sets include ASCII character set, GBXXX character set, Unicode character set, etc.

    GBK: the most commonly used Chinese code table, It is an extended specification based on the GB2312 standard. It uses a double-byte encoding scheme and contains a total of 21,003 Chinese characters. It is fully compatible with the GB2312 standard and supports traditional Chinese characters, Japanese and Korean Chinese characters, etc.

    GB18030: The latest Chinese The code table contains 70244 Chinese characters, using multi-byte encoding. Each character can be composed of 1, 2 or 4 bytes. Supports the characters of Chinese ethnic minorities, as well as traditional Chinese characters, Japanese and Korean Chinese characters, etc.

    Unicode character set:

    is designed to express any character in any language. It is a standard in the industry, also known as It is Unicode and Standard Universal Code; it uses up to 4 bytes of numbers to express each letter, symbol, or text. There are three encoding schemes: UTF-8, UTF-16, and UTF32. The most commonly used is UTF-8

    UTF-8: It can be used to represent any character in the Unicode standard. It is used for emails, web pages, and The preferred encoding used in other applications that store or transfer files. The Internet Working Group requires that all Internet protocols must support the UTF-8 encoding format. It uses one to four bytes to encode each character

    UTF-8 encoding rules:

    128 US-ASCII characters, only one byte encoding is required

    Latin Chinese and other characters require two bytes to encode

    Most commonly used characters (including Chinese) use three bytes to encode

    Other rarely used UniCode auxiliary characters use four characters Section encoding

    Summary: Which rule is used when encoding, and the corresponding rule needs to be used for decoding, otherwise the code will be garbled

    3. Encoding and decoding issues in strings

    Encoding Method (IDEA):

    byte[] getBytes(): Use the platform's default character set to encode the String into a series of bytes, and store the result in a new byte array

    byte[] getBytes(String charsetName): Use the specified character set to encode the String into a series of bytes, and store the result in a new byte array

    Decoding method (IDEA):

    String(byte[]bytes): Constructs a new String by decoding the specified byte array using the platform's default character set

    String(byte[]bytes,String charsetName): Constructs a new String by decoding the specified byte array using the platform's default character set Decode the specified byte array to construct a new String

    The default encoding format in IDEA is UTF-8

    4. Character stream encoding and decoding issues

    Character stream abstraction Base class:

    Reader: abstract class of character input stream

    Writer: abstract class of character output stream

    Two classes related to encoding and decoding issues in the character stream:

    InputStreamReader: is a bridge from byte stream to character stream: it reads bytes and decodes them into characters using the specified character set. The character set it uses can be specified by name, can be specified explicitly, or can accept the platform's default character set

    Constructor:

    InputStreamReader( InputStream in) Create an InputStreamReader using the default character set.
    InputStreamReader(InputStream in, String charsetName) Create an InputStreamReader that uses a named character set.

    OutputStreamWruter: It is a bridge from character stream to byte stream: it uses a custom character set to encode written characters into bytes. The character set it uses can Specified by name, can be specified explicitly, or can accept the platform's default character set

    Construction method:

    OutputStreamWriter(OutputStream out) Create an OutputStreamWriter using the default character encoding.
    OutputStreamWriter(OutputStream out, String charsetName) Create an OutputStreamWriter that uses a named character set.
    public class ConversionStreamDemo {
        public static void main(String[] args) throws IOException {
            //创建一个默认编码格式的InputStreamReader\OutputStreamWriter
            InputStreamReader ipsr = new InputStreamReader(new FileInputStream("E:\\abc.txt"));
            OutputStreamWriter opsw = new OutputStreamWriter(new FileOutputStream("E:\\abc.txt"));
            //写入数据
            opsw.write("你好啊");
            opsw.close();
            //读数据,方式一:一次读取一个字节数据
            int ch;
            while ((ch = ipsr.read()) != -1) {
                System.out.print((char) ch);
            }
            ipsr.close();
    
        }
    }

    四、字符流写数据的五种方法

    方法名 说明
    void write(int c)     写一个字符
    void write(char[] cbuf) 写入一个字符数组
    void write(char[] cbuf,int off,int len) 写入字符数组的一部分
    void write(String str) 写入一个字符串
    void write(String str,int off,int len) 写入一个字符串的一部分

    字符流写数据需要注意缓冲区的问题,如果想要将缓冲区的数据加载出来需要在写入方法后加上刷新方法flush();

    前三个方法与字节流写入方法使用相同,这里重点介绍下面两种方式

    public class OutputStreamWriterDemo {
        public static void main(String[] args) throws IOException {
            //创建一个默认编码格式的OutputStreamWriter对象
            OutputStreamWriter opsw=new OutputStreamWriter(new FileOutputStream("E:\\abc.txt"));
            //方式一:写入一个字节
            opsw.write(97);
            opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法
            //方式二:写入一个字符数组
            char[]ch={'a','b','c','二'};
            opsw.write(ch);
            opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法
            //方式三:写入一个字符数组的一部分
            opsw.write(ch,0,2);
            opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法
            //方式四:写入一个字符串
            opsw.write("一二三");
            opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法
            //方式五:写入一个字符串的一部分
            opsw.write("三四五",1,2);
            opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法
        }
    }

    五、字符流读数据的两种方法

    方法名 说明
    int read()     一次读取一个字符数据
    int read(char[] cbuf) 一次读取一个字符数组数据
    public class InputStreamReadDemo {
        public static void main(String[] args) throws IOException {
            //创建一个默认编码格式的InputStreamReader
            InputStreamReader ipsr=new InputStreamReader(new FileInputStream("E:\\abc.txt"));
            //读取数据,方式一一次读取一个字符数据
            int ch;
            while ((ch=ipsr.read())!=-1){
                System.out.print((char) ch);
            }
            ipsr.close();
            //方式二:一次读取一个字符数组数据
            char []ch=new char[1024];
            int len;
            while ((len=ipsr.read(ch))!=-1){
                System.out.print(new String(ch,0,len));
            }
            ipsr.close();
        }
    }

    小结:如果使用默认编码格式的话,那么字符输入流InputStreamReader可以使用子类FileReader来替代,字符输出流OutputStreamWriter可以使用其子类FileWriter来替代,两者在使用默认编码格式的情况下作用一致。

    The above is the detailed content of Java character stream example analysis. For more information, please follow other related articles on the PHP Chinese website!

    Statement:
    This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete