This article brings you relevant knowledge about java, which mainly introduces the implementation method of Java specifying encoding when creating a file. The article introduces it in detail through sample code, which is very useful for everyone. It has certain reference and learning value when studying or working. I hope it will be helpful to everyone.
Recommended study: "java Video Tutorial"
Foreword: Recently, I learned the knowledge related to Java IO stream. I would like to Practice and consolidate the knowledge you have learned by reading and writing documents. When using the File class to create a file, I suddenly thought, how should I specify the encoding used by the file? Then I thought, how should I check the encoding of a file?
1. Problem analysis
First go to the Internet to find the answer. The results are as follows:
FileInputStream fis=new FileInputStream(“xxxx.txt”); OutputStreamWriter osw=new OutputStreamWriter(fis,“UTF-8”);
The above code probably means that when writing a file, the written characters use UTF-8 encoding is different from what I expected. I want to specify the encoding when creating the file. Like the following,
File myfile = new File("test.txt”, “UTF-8”); if (!myfile.exists()) myfile.createNewFile();
So, I went to check the official documentation of Java API 8. File does not provide a constructor that can specify the character encoding.
At the same time, other methods of accessing character encoding such as set or get are not provided, indicating that character encoding is not an inherent property of the file. Such as file creation time, file modification time, whether it is readable, writable, and executable, these are the inherent attributes of the file, or meta-information, they are part of the file.
2. Character encoding
We know that any information stored in the computer is a string of 01, and text is no exception.
The processing of characters includes two processes: Encoding and decoding
Encoding: "map" the characters to the 01 string
Decoding: 01 The string "maps" to the characters
. Different character encodings, such as GBK and UTF-8, use different rules for encoding and decoding.
For the same text string: "China", use UTF-8 encoding to save. Generally, three bytes are used to save a Chinese character (the hexadecimal form of the underlying 01 string).
Use GBK encoding to save, using two bytes to represent a Chinese character.
When we write and save the text in the text editor, the editor will "map" the text into a 01 string according to the character encoding type you set.
The character type you set is just a conversion rule for the editor to encode text into 10 strings, and is not an attribute of the text.
When the editor opens the text file, what is displayed is not the underlying 01 string, but text. This is because the editor uses a certain text encoding to decode the 01 string into characters. If, when decoding, the character encoding used is consistent or compatible with the encoding, the text can be displayed correctly. If the character encoding used during decoding is inconsistent or incompatible with the encoding, the characters will be garbled.
For example, I have a text file using GBK encoding, the content is "When will the bright moon come out",
character encoding is not an inherent attribute of the file.
I have talked so much just to illustrate this point:Character encoding is the rule used when decoding and encoding, not an inherent attribute of the file.
I can't help but wonder, why didn't the character encoding be set as part of the file attributes?Assuming it can be set and set to GBK, then the operating system needs to maintain the function. Just like a file is not writable, if a program tries to write the file, the operating system will refuse to write. The bytes that the operating system must write must meet the GBK encoding requirements. Then every time a byte is written, the operating system needs Checking the legality of the byte requires a very large performance overhead and is even impossible to implement, because some special bytes can represent either GBK or UTF-8, which is ambiguous. Now, what's the point of doing this? Is it so that the editor can select the correct encoding based on the encoding properties when opening the file? There is no need. A smart editor can infer what encoding your 01 string uses based on the first few bytes of the content. In addition, you can also manually set the character encoding used for decoding.
3. Problem Solving
When creating a file, the encoding of the file cannot be specified. When writing text to a file (for example, Ctrl S
of a text editor to save, which essentially performs a writing operation), you can choose to convert the text into an encoding rule of 01 string.
For Java programs, the code is as follows, which is the code mentioned at the beginning of the article:
FileInputStream fis=new FileInputStream(“xxxx.txt”); OutputStreamWriter osw=new OutputStreamWriter(fis,“UTF-8”);
Recommended learning: "java Video Tutorial"
The above is the detailed content of Java implementation method of specifying encoding when creating a file. For more information, please follow other related articles on the PHP Chinese website!

Start Spring using IntelliJIDEAUltimate version...

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Java...

How does the Redis caching solution realize the requirements of product ranking list? During the development process, we often need to deal with the requirements of rankings, such as displaying a...

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

Detailed explanation of the design of SKU and SPU tables on e-commerce platforms This article will discuss the database design issues of SKU and SPU in e-commerce platforms, especially how to deal with user-defined sales...

How to set the SpringBoot project default run configuration list in Idea using IntelliJ...


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Atom editor mac version download
The most popular open source editor

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Zend Studio 13.0.1
Powerful PHP integrated development environment