1. The origin of character stream
Since it is not very convenient to use byte stream to control Chinese, Java provides character stream to control Chinese
Implementation principle: byte stream encoding table
Why is there no problem when using byte stream to copy text files with Chinese characters?
Because the underlying operation will automatically splice bytes into Chinese
How to identify that the byte is Chinese?
When Chinese characters are stored, whether it is UTF-8 or GBK, the first byte is a negative number to prompt
2. Coding table
Character set:
is a collection of all characters supported by the system, including national characters, punctuation marks, graphic symbols, numbers, etc.
To accurately store and recognize various character set symbols, a computer needs to perform character processing Encoding, a set of character sets must have at least one set of character encodings
Common character sets include ASCII character set, GBXXX character set, Unicode character set, etc.
GBK: the most commonly used Chinese code table, It is an extended specification based on the GB2312 standard. It uses a double-byte encoding scheme and contains a total of 21,003 Chinese characters. It is fully compatible with the GB2312 standard and supports traditional Chinese characters, Japanese and Korean Chinese characters, etc.
GB18030: The latest Chinese The code table contains 70244 Chinese characters, using multi-byte encoding. Each character can be composed of 1, 2 or 4 bytes. Supports the characters of Chinese ethnic minorities, as well as traditional Chinese characters, Japanese and Korean Chinese characters, etc.
Unicode character set:
is designed to express any character in any language. It is a standard in the industry, also known as It is Unicode and Standard Universal Code; it uses up to 4 bytes of numbers to express each letter, symbol, or text. There are three encoding schemes: UTF-8, UTF-16, and UTF32. The most commonly used is UTF-8
UTF-8: It can be used to represent any character in the Unicode standard. It is used for emails, web pages, and The preferred encoding used in other applications that store or transfer files. The Internet Working Group requires that all Internet protocols must support the UTF-8 encoding format. It uses one to four bytes to encode each character
UTF-8 encoding rules:
128 US-ASCII characters, only one byte encoding is required
Latin Chinese and other characters require two bytes to encode
Most commonly used characters (including Chinese) use three bytes to encode
Other rarely used UniCode auxiliary characters use four characters Section encoding
Summary: Which rule is used when encoding, and the corresponding rule needs to be used for decoding, otherwise the code will be garbled
3. Encoding and decoding issues in strings
Encoding Method (IDEA):
byte[] getBytes(): Use the platform's default character set to encode the String into a series of bytes, and store the result in a new byte array
byte[] getBytes(String charsetName): Use the specified character set to encode the String into a series of bytes, and store the result in a new byte array
Decoding method (IDEA):
String(byte[]bytes): Constructs a new String by decoding the specified byte array using the platform's default character set
String(byte[]bytes,String charsetName): Constructs a new String by decoding the specified byte array using the platform's default character set Decode the specified byte array to construct a new String
The default encoding format in IDEA is UTF-8
4. Character stream encoding and decoding issues
Character stream abstraction Base class:
Reader: abstract class of character input stream
Writer: abstract class of character output stream
Two classes related to encoding and decoding issues in the character stream:
InputStreamReader: is a bridge from byte stream to character stream: it reads bytes and decodes them into characters using the specified character set. The character set it uses can be specified by name, can be specified explicitly, or can accept the platform's default character set
Constructor:
InputStreamReader( InputStream in) | Create an InputStreamReader using the default character set. |
InputStreamReader(InputStream in, String charsetName) | Create an InputStreamReader that uses a named character set. |
OutputStreamWruter: It is a bridge from character stream to byte stream: it uses a custom character set to encode written characters into bytes. The character set it uses can Specified by name, can be specified explicitly, or can accept the platform's default character set
Construction method:
OutputStreamWriter(OutputStream out) | Create an OutputStreamWriter using the default character encoding. |
OutputStreamWriter(OutputStream out, String charsetName) | Create an OutputStreamWriter that uses a named character set. |
public class ConversionStreamDemo { public static void main(String[] args) throws IOException { //创建一个默认编码格式的InputStreamReader\OutputStreamWriter InputStreamReader ipsr = new InputStreamReader(new FileInputStream("E:\\abc.txt")); OutputStreamWriter opsw = new OutputStreamWriter(new FileOutputStream("E:\\abc.txt")); //写入数据 opsw.write("你好啊"); opsw.close(); //读数据,方式一:一次读取一个字节数据 int ch; while ((ch = ipsr.read()) != -1) { System.out.print((char) ch); } ipsr.close(); } }
四、字符流写数据的五种方法
方法名 | 说明 |
void write(int c) | 写一个字符 |
void write(char[] cbuf) | 写入一个字符数组 |
void write(char[] cbuf,int off,int len) | 写入字符数组的一部分 |
void write(String str) | 写入一个字符串 |
void write(String str,int off,int len) | 写入一个字符串的一部分 |
字符流写数据需要注意缓冲区的问题,如果想要将缓冲区的数据加载出来需要在写入方法后加上刷新方法flush();
前三个方法与字节流写入方法使用相同,这里重点介绍下面两种方式
public class OutputStreamWriterDemo { public static void main(String[] args) throws IOException { //创建一个默认编码格式的OutputStreamWriter对象 OutputStreamWriter opsw=new OutputStreamWriter(new FileOutputStream("E:\\abc.txt")); //方式一:写入一个字节 opsw.write(97); opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法 //方式二:写入一个字符数组 char[]ch={'a','b','c','二'}; opsw.write(ch); opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法 //方式三:写入一个字符数组的一部分 opsw.write(ch,0,2); opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法 //方式四:写入一个字符串 opsw.write("一二三"); opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法 //方式五:写入一个字符串的一部分 opsw.write("三四五",1,2); opsw.flush();//如果需要在文件中立即显示输入的数据,就需要加入刷新方法 } }
五、字符流读数据的两种方法
方法名 | 说明 |
int read() | 一次读取一个字符数据 |
int read(char[] cbuf) | 一次读取一个字符数组数据 |
public class InputStreamReadDemo { public static void main(String[] args) throws IOException { //创建一个默认编码格式的InputStreamReader InputStreamReader ipsr=new InputStreamReader(new FileInputStream("E:\\abc.txt")); //读取数据,方式一一次读取一个字符数据 int ch; while ((ch=ipsr.read())!=-1){ System.out.print((char) ch); } ipsr.close(); //方式二:一次读取一个字符数组数据 char []ch=new char[1024]; int len; while ((len=ipsr.read(ch))!=-1){ System.out.print(new String(ch,0,len)); } ipsr.close(); } }
小结:如果使用默认编码格式的话,那么字符输入流InputStreamReader可以使用子类FileReader来替代,字符输出流OutputStreamWriter可以使用其子类FileWriter来替代,两者在使用默认编码格式的情况下作用一致。
The above is the detailed content of Java character stream example analysis. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于结构化数据处理开源库SPL的相关问题,下面就一起来看一下java下理想的结构化数据处理类库,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于PriorityQueue优先级队列的相关知识,Java集合框架中提供了PriorityQueue和PriorityBlockingQueue两种类型的优先级队列,PriorityQueue是线程不安全的,PriorityBlockingQueue是线程安全的,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于java锁的相关问题,包括了独占锁、悲观锁、乐观锁、共享锁等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于多线程的相关问题,包括了线程安装、线程加锁与线程不安全的原因、线程安全的标准类等等内容,希望对大家有帮助。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于枚举的相关问题,包括了枚举的基本操作、集合类对枚举的支持等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Java的相关知识,其中主要介绍了关于关键字中this和super的相关问题,以及他们的一些区别,下面一起来看一下,希望对大家有帮助。

封装是一种信息隐藏技术,是指一种将抽象性函式接口的实现细节部分包装、隐藏起来的方法;封装可以被认为是一个保护屏障,防止指定类的代码和数据被外部类定义的代码随机访问。封装可以通过关键字private,protected和public实现。

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于平衡二叉树(AVL树)的相关知识,AVL树本质上是带了平衡功能的二叉查找树,下面一起来看一下,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

WebStorm Mac version
Useful JavaScript development tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.
