Java implementation method of specifying encoding when creating a file-javaTutorial-php.cn

Home

Java

javaTutorial

Java implementation method of specifying encoding when creating a file

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 24, 2022 am 09:09 AM

java

This article brings you relevant knowledge about java, which mainly introduces the implementation method of Java specifying encoding when creating a file. The article introduces it in detail through sample code, which is very useful for everyone. It has certain reference and learning value when studying or working. I hope it will be helpful to everyone.

Java implementation method of specifying encoding when creating a file

Recommended study: "java Video Tutorial"

Foreword: Recently, I learned the knowledge related to Java IO stream. I would like to Practice and consolidate the knowledge you have learned by reading and writing documents. When using the File class to create a file, I suddenly thought, how should I specify the encoding used by the file? Then I thought, how should I check the encoding of a file?

1. Problem analysis

First go to the Internet to find the answer. The results are as follows:

FileInputStream fis=new FileInputStream(“xxxx.txt”)；
OutputStreamWriter osw=new OutputStreamWriter(fis,“UTF-8”);

The above code probably means that when writing a file, the written characters use UTF-8 encoding is different from what I expected. I want to specify the encoding when creating the file. Like the following,

File myfile = new File("test.txt”, “UTF-8”);
if (!myfile.exists()) myfile.createNewFile();

So, I went to check the official documentation of Java API 8. File does not provide a constructor that can specify the character encoding.

Java implementation method of specifying encoding when creating a file

At the same time, other methods of accessing character encoding such as set or get are not provided, indicating that character encoding is not an inherent property of the file. Such as file creation time, file modification time, whether it is readable, writable, and executable, these are the inherent attributes of the file, or meta-information, they are part of the file.

Java implementation method of specifying encoding when creating a file

2. Character encoding

We know that any information stored in the computer is a string of 01, and text is no exception.

The processing of characters includes two processes: Encoding and decoding

Encoding: "map" the characters to the 01 string
Decoding: 01 The string "maps" to the characters

. Different character encodings, such as GBK and UTF-8, use different rules for encoding and decoding.

For the same text string: "China", use UTF-8 encoding to save. Generally, three bytes are used to save a Chinese character (the hexadecimal form of the underlying 01 string).

Java implementation method of specifying encoding when creating a file

Use GBK encoding to save, using two bytes to represent a Chinese character.

Java implementation method of specifying encoding when creating a file

When we write and save the text in the text editor, the editor will "map" the text into a 01 string according to the character encoding type you set.

The character type you set is just a conversion rule for the editor to encode text into 10 strings, and is not an attribute of the text.

When the editor opens the text file, what is displayed is not the underlying 01 string, but text. This is because the editor uses a certain text encoding to decode the 01 string into characters. If, when decoding, the character encoding used is consistent or compatible with the encoding, the text can be displayed correctly. If the character encoding used during decoding is inconsistent or incompatible with the encoding, the characters will be garbled.

For example, I have a text file using GBK encoding, the content is "When will the bright moon come out",

Java implementation method of specifying encoding when creating a file

# #I use VS code (a very easy-to-use text editor from Microsoft) to open the file. In terminology, it is to decode the file. The default text encoding used is UTF-8, and the decoding is the same. However, because the bottom layer of my text is a GBK-encoded 01 string (two bytes and one character), using UTF-8 to decode the 01 string will inevitably lead to garbled characters due to inconsistent encoding and decoding. At this time, as long as you manually select the corresponding GBK encoding, the decoded file will not be garbled.

Garbled characters also illustrate from the side that

character encoding is not an inherent attribute of the file.

I have talked so much just to illustrate this point:

Character encoding is the rule used when decoding and encoding, not an inherent attribute of the file.

I can't help but wonder, why didn't the character encoding be set as part of the file attributes?

Assuming it can be set and set to GBK, then the operating system needs to maintain the function. Just like a file is not writable, if a program tries to write the file, the operating system will refuse to write. The bytes that the operating system must write must meet the GBK encoding requirements. Then every time a byte is written, the operating system needs Checking the legality of the byte requires a very large performance overhead and is even impossible to implement, because some special bytes can represent either GBK or UTF-8, which is ambiguous. Now, what's the point of doing this? Is it so that the editor can select the correct encoding based on the encoding properties when opening the file? There is no need. A smart editor can infer what encoding your 01 string uses based on the first few bytes of the content. In addition, you can also manually set the character encoding used for decoding.

3. Problem Solving

When creating a file, the encoding of the file cannot be specified. When writing text to a file (for example, Ctrl S of a text editor to save, which essentially performs a writing operation), you can choose to convert the text into an encoding rule of 01 string.

For Java programs, the code is as follows, which is the code mentioned at the beginning of the article:

FileInputStream fis=new FileInputStream(“xxxx.txt”)；
OutputStreamWriter osw=new OutputStreamWriter(fis,“UTF-8”);

Recommended learning: "java Video Tutorial"

The above is the detailed content of Java implementation method of specifying encoding when creating a file. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:脚本之家. If there is any infringement, please contact admin@php.cn delete

How does IntelliJ IDEA identify the port number of a Spring Boot project without outputting a log?Apr 19, 2025 pm 11:45 PM

Start Spring using IntelliJIDEAUltimate version...

How to elegantly obtain entity class variable names to build database query conditions?Apr 19, 2025 pm 11:42 PM

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Java BigDecimal operation: How to accurately control the accuracy of calculation results?Apr 19, 2025 pm 11:39 PM

Java...

How to use the Redis cache solution to efficiently realize the requirements of product ranking list?Apr 19, 2025 pm 11:36 PM

How does the Redis caching solution realize the requirements of product ranking list? During the development process, we often need to deal with the requirements of rankings, such as displaying a...

How to safely convert Java objects to arrays?Apr 19, 2025 pm 11:33 PM

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

How do I convert names to numbers to implement sorting and maintain consistency in groups?Apr 19, 2025 pm 11:30 PM

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

E-commerce platform SKU and SPU database design: How to take into account both user-defined attributes and attributeless products?Apr 19, 2025 pm 11:27 PM

Detailed explanation of the design of SKU and SPU tables on e-commerce platforms This article will discuss the database design issues of SKU and SPU in e-commerce platforms, especially how to deal with user-defined sales...

How to set the default run configuration list of SpringBoot projects in Idea for team members to share?Apr 19, 2025 pm 11:24 PM

How to set the SpringBoot project default run configuration list in Idea using IntelliJ...

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),