Home >Backend Development >PHP Tutorial >String segmentation utf-8 (supports Chinese, Japanese, Korean, etc., efficient,)
Because mb_substr and mb_strlen are too inefficient, this code is used.
Not original, the main principle is based on the encoding characteristics of UTF-8 0xxxxxxx 110xxxxx 10xxxxxx 1110xxxx 10xxxxxx 10xxxxxx 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx to get the character boundary, thereby determining the number of bytes occupied by a word, and processing it into an array. It is convenient for users who frequently operate characters. This function is 10 times more efficient than mb_substr. I once wrote an "N million banned word replacement class". During the development process of this type, I compared the efficiency of the two in detail. , this function clearly wins.
|