Home  >  Article  >  Backend Development  >  The function of detecting whether a string is utf8 encoded in php

The function of detecting whether a string is utf8 encoded in php

怪我咯
怪我咯Original
2017-07-09 09:26:151875browse

Given a string, how to determine what encoding it is? PHP has a function: mb_detect_encoding. However, this thing requires the mb_string library, which is not available everywhere.

 function is_utf8($string) { 
     return preg_match('%^(?: 
             [\x09\x0A\x0D\x20-\x7E]                 # ASCII 
         | [\xC2-\xDF][\x80-\xBF]                 # non-overlong 2-byte 
         |     \xE0[\xA0-\xBF][\x80-\xBF]             # excluding overlongs 
         | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}     # straight 3-byte 
         |     \xED[\x80-\x9F][\x80-\xBF]             # excluding surrogates 
         |     \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3 
         | [\xF1-\xF3][\x80-\xBF]{3}             # planes 4-15 
         |     \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16 
     )*$%xs', $string);      
}

The accuracy is basically the same as mb_detect_encoding, both correct and wrong.
Encoding detection cannot be 100% accurate. This thing can basically meet the requirements.

The above is the detailed content of The function of detecting whether a string is utf8 encoded in php. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn