Home  >  Article  >  Java  >  Diamond garbled characters appear in Java code?

Diamond garbled characters appear in Java code?

Guanhui
GuanhuiOriginal
2020-06-08 09:47:003510browse

Diamond garbled characters appear in Java code?

# Diamond garbled characters appear in Java code?

Diamond-shaped garbled characters appear in Java code. Generally, it is due to character set issues. For example, the Java file is GBK encoded, but when the editor opens the Java file with UTF-8 encoding, this kind of garbled code is displayed. Solve it Method: Change the editor's encoding to the same encoding as the Java file.

Encoding

Encoding is the process of converting information from one form or format to another, also The code called a computer programming language is simply called coding. Use a predetermined method to encode characters, numbers or other objects into numbers, or convert information and data into prescribed electrical pulse signals. Coding is widely used in electronic computers, televisions, remote controls and communications. Encoding is the process of converting information from one form or format to another. Decoding is the reverse process of encoding.

Among the GB encoding standards, the more commonly used ones are GB2312 and GBK. GB2312 is a subset of GBK. The GB2312 encoding range is 0xA1A1 - 0xFEFE. If it is pure GB2312 encoding, it is very simple to process. But there are some small tips when dealing with the GBK character set. Let’s talk about the GBK encoding standard first:

GBK uses double-byte representation. The overall encoding range is 8140-FEFE, and the first byte is between 81-FE. , the last byte is between 40-FE, and a line of xx7F is eliminated. There are a total of 23,940 code points, and a total of 21,886 Chinese characters and graphic symbols are included, including 21,003 Chinese characters (including radicals and components) and 883 graphic symbols.

Encoding classification

1. Chinese character area. Including:

a. GB 2312 Chinese character area. That is GBK/2: B0A1-F7FE. Contains 6763 GB 2312 Chinese characters, arranged in original order.

b. GB 13000.1 Expand the Chinese character area. Includes:

(1) GBK/3: 8140-A0FE. Contains 6080 CJK Chinese characters in GB 13000.1.

(2) GBK/4: AA40-FEA0. Contains 8160 CJK Chinese characters and supplemented Chinese characters.

CJK Chinese characters are at the front, arranged according to UCS code size; supplementary Chinese characters (including radicals and components) are at the end, arranged according to the page number/character position of the "Kangxi Dictionary".

2. Graphic symbol area. Including:

a. GB 2312 non-Chinese character symbol area. That is GBK/1: A1A1-A9FE. In addition to the symbols of GB 2312,

also has 10 lowercase Roman numerals and symbols supplemented by GB 12345. There are 717 symbols in total.

b. GB 13000.1 Expand the non-Chinese character area. That is GBK/5: A840-A9A0. BIG-5 Non-Chinese characters, structural symbols and "○" are arranged in this area. There are 166 symbols in total.

3. User-defined area: divided into three areas (1)(2)(3).

(1) AAA1-AFFE, 564 code points.

(2) F8A1-FEFE, 658 code points.

(3) A140-A7A0, 672 code points.

Although area (3) is open to users, its use is restricted because the possibility of adding new characters to this area in the future cannot be ruled out.

Here are a few tips:

1. In php, the character encoding is based on the encoding sent, so the encoding input by the user is used and will not change automatically. , but in asp, the default encoding is unicode, so we can easily get the encoding comparison table of gbk->unicode, so that we can easily implement gbk to utf-8 even without any basic library. Converted;

2. Since the lowest value of GBK’s high bit is 0x40, which is 64, therefore, sometimes when organizing some strings involving Chinese, it is best to use the ascii code before 64 to separate characters, so There will be no garbled characters when replacing or dividing under any circumstances. The more commonly used characters are ",", ";", ":", " ", " ", " ". These characters will never cause confusion in gb encoding

Recommended tutorial: "Java Tutorial"

The above is the detailed content of Diamond garbled characters appear in Java code?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn