Home > Article > Backend Development > Detailed explanation about ord($str)>0x80 in PHP_PHP tutorial
The encoding of the GBK simplified character set is represented by both 1 byte and 2 bytes. When the high bit is 0x00~0x7f, it is one byte. When the high bit is 0x80 or above, it is represented by 2 bytes. "
Note: All the brackets are binary
When you find that the content of a byte is greater than 0x7f, then it must be a Chinese character (joined together with another byte). How to judge that it is definitely greater than 0x7f?
The number after 0x7f (1111111) is 0x80 (10000000) ), so if you want it to be greater than 0x7f, the highest bit of this byte must be 1. We only need to determine whether the highest bit is 1.
Judgment method:
Bitwise AND (the same bits are all 1, it is 1, otherwise it is 0):
For example: to determine whether the third digit of a number is 1, just follow 4 (100) bitwise ANDs to determine one To determine whether the second digit of a number is 1, just follow the AND of 2(10) bits.
Similarly, to determine whether the eighth digit is 1, just follow (10000000), which is the 0x80 bit AND.
Why not use it here? >0x7f, PHP may be OK, but in other strongly typed languages, the highest bit of 1 byte is used to mark a negative number. A negative number cannot be greater than 0x7f (the largest integer)
Another example Example:
The assic code of a is 97 (1100001)
The assic code of A is 65 (1000001)
The assic code of b is 98 (1100010)
The assic code of b is 66 (1000010)
Found a rule: as long as a letter from a-z is a lowercase letter, the sixth digit must be 1. We can use this to determine the case:
At this time, it only needs to be followed by a letter Follow 0x20 (100000) and judge: