Home  >  Article  >  Backend Development  >  Method 2 to display web pages normally in any character set (continued)_PHP Tutorial

Method 2 to display web pages normally in any character set (continued)_PHP Tutorial

WBOY
WBOYOriginal
2016-07-21 15:55:45932browse

Transfer to: coolcode.cn
A few days ago I wrote an article on how to display web pages normally in any character set. The introduction is very simple, that is, character sets other than the first 128 characters are represented by NCR, but I did not introduce the specific conversion method. , because I thought it was too simple at the time. But later I found someone asked this question, so I will explain it in detail here.
The first step is to convert the string of the source character set into the UTF-16 character set. This step is because each character in the UTF-16 character set is two bytes, and it is easy to process later. , and it would be very complicated to process directly on the source character set. The source character set can be obtained from the meta tag in the original web page, or can be specified separately. My program allows the user to specify the source character set in the form, because I cannot guarantee that the file submitted by the user must be an HTML file (other files are also Yes, for example, the Chinese language package source file of WordPress is a po file, and the content in it can also be processed in this way), and even if it is an HTML file, it does not necessarily have a meta tag for specifying the character set, so specify it separately through the form The character set is relatively safe. You may think that converting one character set to another is complicated. Indeed, it is very troublesome to implement it yourself, but it is very easy to do it with PHP because it already contains such a function. , you can easily achieve conversion between various character sets through the iconv function. If the iconv extension is not installed on your machine, you can also use the mb_convert_encoding function. If the Multibyte String extension is not installed, there is nothing you can do. , because it is basically impossible for you to convert so many types of codes yourself, unless you are a top expert! It is recommended to use iconv because it is more efficient and supports more character sets.
After completing the above step, the next step is to process the string in units of two bytes. These two bytes are directly converted into numbers and are xxxxx in xxxx;. If the number is less than 128, use this character directly (note that it becomes a single byte here), otherwise use the form of xxxx;. One thing to note here is that when this number is 65279 (hexadecimal 0xFEFF), please ignore it, because this is the transmission control character in Unicode encoding, and our current string already only has iso-8859- 1 is the first 128 characters in the encoding, so we don't need it.
Okay, the basic idea is this. Here is the implementation program:
Download: nochaoscode.php

Copy code The code is as follows:

function nochaoscode($encode, $str) {
$str = iconv($encode, "UTF-16BE", $str);
for ($i = 0; $i < strlen($str); $i++,$i++) {
$code = ord($str{$i}) * 256 + ord($str{$i + 1});
if ($code < 128) {
$output .= chr($code);
} else if ($code != 65279) {
$ OUTPUT. = "&#". $ Code. ";"; ";
       }
}
return $output;
}
?>

Among the parameters of the function, $encode is the source character set, and $str is the string that needs to be converted. The return result is the converted string.
Supplement: Today Legend told me a simpler method, which is to directly use the mb_convert_encoding function. Because mb_convert_encoding supports an encoding format called HTML-ENTITIES, which is NCR encoding. It's even simpler to use it.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/318231.htmlTechArticleRedirected to: coolcode.cn A few days ago I wrote an article on how to display web pages normally in any character set, which is The introduction is very simple, that is, character sets other than the first 128 characters are represented by NCR, but...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn