Home >Backend Development >PHP Tutorial >php multibyte string
Introduction
Although every necessary character in many languages can be mapped one-to-one to an 8-bit value, there are also several languages that require so many characters for written communication that their encoding range cannot only include In a byte (a byte Byte consists of 8 bits. Each bit can only contain two different values: 1 or 0. Therefore, one byte can only represent 256 different values, that is, eight of 2 power). Multibyte character encoding schemes were developed to express more than 256 characters in conventional byte-based encoding systems.
When you operate (trim, split, splice, etc.) multi-byte encoded strings, because under this encoding scheme, two or more consecutive bytes may only express one character, so you need Use specialized functions. Otherwise, when you apply a function that cannot detect multi-byte strings to this string, it may not be able to detect the starting position of multi-byte characters and end up with a garbled string, basically losing its original meaning.
mbstring provides functions for multi-byte strings, which can help you handle multi-byte encoding in PHP. In addition, mbstring can convert between possible character encodings. For convenience, mbstring is designed to handle Unicode-based encodings, such as UTF-8, UCS-2, and many single-byte encodings.
mbstring is not a default extension. This means that it is not activated by default. You must activate this module explicitly in the configure option.
HTTP Input and Output
HTTP input/output character encoding conversion also works for binary data. If binary data is used for HTTP input/output, the user should control character encoding conversion.
Since PHP 4.3.3, if the enctype attribute of the HTML form is set to multipart/form-data, and mbstring.encoding_translation in php.ini is set to On, POST variables and the name of the uploaded file will also be converted to Internal character encoding. However, the transformation will not be applied to the keys of the query.
HTTP input There is no way to control the conversion of HTTP input characters in PHP scripts. To disable HTTP input character conversion, this must be set in php.ini.
Example #1 Disabling HTTP input conversion in php.ini
;; 禁用 HTTP 输入转换 mbstring.http_input = pass ;;禁用 HTTP 输入转换 (PHP 4.3.0 或更高版本) mbstring.encoding_translation = Off
When PHP is running as Apache module. These settings can also be overridden through each virtual host (Virtual Host) directive in httpd.conf or .htaccess in each directory.
There are several ways to use HTTP output character encoding conversion. One is to use php.ini, the other is to use ob_start(), with mb_output_handler() as the callback function of ob_start.
Example #2 php.ini setting example
;; 为所有 PHP 页面启用输出字符编码的转换 ;; 启用输出缓冲 output_buffering = On ;; 设置 mb_output_handler 来进行输出的转换 output_handler = mb_output_handler
Example #3 Script example
<?php // 仅为此页面启用输出字符编码的转换 // 设置 HTTP 输出字符编码为 SJIS mb_http_output('SJIS'); // 开始缓冲并指定 "mb_output_handler" 为回调函数 ob_start('mb_output_handler'); ?>
Multi-byte string function
mb_check_encoding — Check whether the string is valid in the specified encoding
mb_convert_case — Resize the string Write conversion
mb_convert_encoding — Convert character encodings
mb_convert_kana — Convert "kana" one from another ("zen-kaku", "han-kaku" and more)
mb_convert_variables — Convert character encodings of one or more variables
mb_decode_mimeheader — Decode the string in the MIME header field
mb_decode_numericentity — Decode the HTML numeric string into characters
mb_detect_encoding — Detect the encoding of the character
mb_detect_order — Set/get the detection order of character encoding
mb_encode_mimeheader — For MIME head Encoded string
mb_encode_numericentity — Encode character to HTML numeric string reference
mb_encoding_aliases — Get aliases of a known encoding type
mb_ereg_match — Regular expression match for multibyte string
mb_ereg_replace_callback — Perform a regular expression seach and replace with multibyte support using a callback
mb_ereg_replace — Replace regular expression with multibyte support
mb_ereg_search_getpos — Returns start point for next regular expression match
mb_ereg_search_getregs — Retrieve the result from the last multibyte regular expression match
mb_ereg_search_init — Setup string and regular expression for a multibyte regular expression match
mb_ereg_search_pos — Returns position and length of a matched part of the multibyte regular expression for a predefined multibyte string
mb_ereg_search_regs — Returns the matched part of a multibyte regular expression
mb_ereg_search_setpos — Set start point of next regular expression match
mb_ereg_search — Multibyte regular expression match for predefined multibyte string
mb_ereg — Regular expression match with multibyte support
mb_eregi_replace — Replace regular expression with multibyte support ignoring case
mb_eregi — Regular expression match ignoring case with multibyte support
mb_get_info — Get the internal settings of mbstring
mb_http_input — Detect HTTP input character encoding
mb_http_output — Set/get HTTP output character encoding
mb_internal_encoding — Set/get the internal character encoding
mb_language — Set/get the current language
mb_list_encodings — Return an array of all supported encodings
mb_output_handler — Callback function for converting character encoding in the output buffer
mb_parse_str — Parse GET/ POST/COOKIE data and set global variables
mb_preferred_mime_name — Get MIME string
mb_regex_encoding — Set/Get character encoding for multibyte regex
mb_regex_set_options — Set/Get the default options for mbregex functions
mb_send_mail — Send encoded mail
mb_split — Split a multi-byte string using regular expressions
mb_strcut — Get a part of a character
mb_strimwidth — Get a string truncated by a specified width
mb_stripos — Case-insensitively find a string where another character The position of the first occurrence in a string
mb_stristr — Find the first occurrence of a string in another string, case-insensitively
mb_strlen — Get the length of a string
mb_strpos — Find the first occurrence of a string in another string Occurrence position
mb_strrchr — Find the last occurrence of the specified character in another string
mb_strrichr — Find the last occurrence of the specified character in another string in a case-insensitive manner
mb_strripos — Case-insensitive Find the last occurrence of a string in a string
mb_strrpos — Find the last occurrence of a string in a string
mb_strstr — Find the first occurrence of a string in another string
mb_strtolower — Use String lower case
mb_strtoupper — Make the string uppercase
mb_strwidth — Return the width of the string
mb_substitute_character — Set/get the substitution character
mb_substr_count — Count the number of occurrences of the string
mb_substr — Get the part of the string