Home >Common Problem >In what form is Chinese character information usually stored in computers?
Chinese character information is usually stored in computers in the form of internal codes. Chinese character internal code, also known as "Chinese character ASCII code", referred to as "internal code", refers to the code composed of 0 and 1 symbols used in the computer's internal storage, processing and transmission of Chinese characters.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
Chinese character information is usually stored in the computer in the form of internal codes.
After the input code is accepted, it is converted into an internal code by the "input code conversion module" of the Chinese character operating system, regardless of the keyboard input method used. The in-machine code is the most basic encoding of Chinese characters. No matter what the Chinese character system and the Chinese character input method are, the input Chinese character external code must be converted into the in-machine code inside the machine before it can be stored and processed in various ways.
Detailed explanation
Because the Chinese character processing system must ensure compatibility between Chinese and Western languages, ambiguity will occur when ASCII codes and Chinese character national standard codes exist in the system. . For example: there are two bytes of content 30H and 21H, which can represent the national standard code of the Chinese character "ah" and the ASCII code of the Spanish "0" and "!". For this reason, the Chinese character in-machine code should be appropriately processed and transformed into the national standard code.
The internal code of the national standard code is a two-byte long code. It is to add "1" to the highest bit of each byte of the corresponding national standard code, that is,
Chinese character internal code =Chinese character national standard code 8080H
For example, the national standard code of the character "ah" mentioned above is 3021H, and its Chinese character internal code is B0A1H.
The basis of the Chinese character in-machine code is the Chinese character national standard code.
In-machine code: In order to avoid ambiguity problems when ASCII codes and national standard codes are used at the same time, most Chinese character systems use the high position 1 of each byte of the national standard code as the internal code of Chinese characters. This not only solves the ambiguity between the Chinese character in-machine code and the Western in-machine code, but also makes the Chinese character in-machine code and the national standard code have a very simple correspondence relationship.
The relationship between the Chinese character internal code, the national standard code and the location code is: the two bytes of the location code (decimal) are converted to hexadecimal and then added with 2020H to obtain the corresponding national standard code; The internal code is the highest bit of the two bytes of the Chinese character exchange code (national standard code) plus 1 respectively, that is, the two bytes of the Chinese character exchange code (national standard code) are added with 80H to get the corresponding internal code; area code (decimal) The two bytes are converted to hexadecimal and then A0H is added to obtain the corresponding internal code.
If you want to read more related articles, please visit PHP Chinese website! !
The above is the detailed content of In what form is Chinese character information usually stored in computers?. For more information, please follow other related articles on the PHP Chinese website!