search
HomeBackend DevelopmentPHP ProblemHow to use php to convert unicode and utf8?

How to use php to convert unicode and utf8?

Jul 10, 2020 am 11:03 AM
phpunicodeutf8

How to use PHP to convert unicode and utf8: First, there is no need to consider the [4-6] byte encoding; then [utf-8] characters with more than four bytes appear, which can be directly regarded as garbled characters. Just ignore it or convert it to unicode entity form, the code is [$utf8char = "{$c};"].

How to use php to convert unicode and utf8?

How to use php to convert unicode and utf8:

Unicode encoding is to implement utf-8 and gb series The basis for encoding (gb2312, gbk, gb18030) conversion. Although we can also directly make a comparison table from utf-8 to these encodings, few people will do this because the variable encoding of utf-8 is uncertain. Therefore, in general, the comparison table between unicode and gb encoding is used. Unicode (UCS-2) is actually the basic encoding of utf-8, and utf-8 is just an implementation of it. The two have the following correspondence:

  • Unicode symbol range | UTF-8 encoding method

  • ##u0000 0000 - u0000 007F | 0xxxxxxx

  • u0000 0080 - u0000 07FF | 110xxxxx 10xxxxxx

  • ##u0000 0800 - u0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
  • Due to the characters currently used by utf-8 They are all in UCS-2, so there is no need to consider the 4-6 byte encoding. Similarly, during reverse conversion, if more than four bytes of UTF-8 characters appear, they can be directly regarded as garbled characters. Ignore or convert to unicode entity form ("long int;" form), and then hand it over to the browser or related parsing program for processing. The algorithm for converting unicode to utf-8 encoding using PHP is as follows:
/*
 * 参数 $c 是unicode字符编码的int类型数值,如果是用二进制读取的数据,在php中通常要用 hexdec(bin2hex( $bin_unichar )) 这样转换
 */
function uni2utf8( $c )
{
  if ($c < 0x80)
  {
        $utf8char = chr($c);
  }
  else if ($c < 0x800)
  {
        $utf8char = chr(0xC0 | $c >> 0x06).chr(0x80 | $c & 0x3F);
  }
  else if ($c < 0x10000)
  {
        $utf8char = chr(0xE0 | $c >> 0x0C).chr(0x80 | $c >> 0x06 & 0x3F).chr(0x80 | $c & 0x3F);
  }
  //因为UCS-2只有两字节,所以后面的情况是不可能出现的,这里只是说明unicode HTML实体编码的用法。
  else
  {
        $utf8char = "&#{$c};";
  }
  return $utf8char;
}

Within the current environment, it can be considered that utf-8 character set == unicode (UCS-2), but in theory, the inclusion relationship of the main character set relationships is as follows:

utf-8 > unicode(UCS-2) > gb18030 > gbk > gb2312

Therefore, if encoding When all are correct:

gb2312 => gbk => gb18030 => unicode(UCS-2) => utf-8

Such a transformation process is basically lossless, but on the contrary, by

utf-8 => unicode(UCS-2) => gb18030=> gbk => gb2312

In such a conversion process, it is very likely that there will be unrecognizable characters. Therefore, if the system uses UTF-8 encoding, try not to reverse the encoding operation easily.

Related learning recommendations:
PHP programming from entry to proficiency

The above is the detailed content of How to use php to convert unicode and utf8?. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
ACID vs BASE Database: Differences and when to use each.ACID vs BASE Database: Differences and when to use each.Mar 26, 2025 pm 04:19 PM

The article compares ACID and BASE database models, detailing their characteristics and appropriate use cases. ACID prioritizes data integrity and consistency, suitable for financial and e-commerce applications, while BASE focuses on availability and

PHP Secure File Uploads: Preventing file-related vulnerabilities.PHP Secure File Uploads: Preventing file-related vulnerabilities.Mar 26, 2025 pm 04:18 PM

The article discusses securing PHP file uploads to prevent vulnerabilities like code injection. It focuses on file type validation, secure storage, and error handling to enhance application security.

PHP Input Validation: Best practices.PHP Input Validation: Best practices.Mar 26, 2025 pm 04:17 PM

Article discusses best practices for PHP input validation to enhance security, focusing on techniques like using built-in functions, whitelist approach, and server-side validation.

PHP API Rate Limiting: Implementation strategies.PHP API Rate Limiting: Implementation strategies.Mar 26, 2025 pm 04:16 PM

The article discusses strategies for implementing API rate limiting in PHP, including algorithms like Token Bucket and Leaky Bucket, and using libraries like symfony/rate-limiter. It also covers monitoring, dynamically adjusting rate limits, and hand

PHP Password Hashing: password_hash and password_verify.PHP Password Hashing: password_hash and password_verify.Mar 26, 2025 pm 04:15 PM

The article discusses the benefits of using password_hash and password_verify in PHP for securing passwords. The main argument is that these functions enhance password protection through automatic salt generation, strong hashing algorithms, and secur

OWASP Top 10 PHP: Describe and mitigate common vulnerabilities.OWASP Top 10 PHP: Describe and mitigate common vulnerabilities.Mar 26, 2025 pm 04:13 PM

The article discusses OWASP Top 10 vulnerabilities in PHP and mitigation strategies. Key issues include injection, broken authentication, and XSS, with recommended tools for monitoring and securing PHP applications.

PHP XSS Prevention: How to protect against XSS.PHP XSS Prevention: How to protect against XSS.Mar 26, 2025 pm 04:12 PM

The article discusses strategies to prevent XSS attacks in PHP, focusing on input sanitization, output encoding, and using security-enhancing libraries and frameworks.

PHP Interface vs Abstract Class: When to use each.PHP Interface vs Abstract Class: When to use each.Mar 26, 2025 pm 04:11 PM

The article discusses the use of interfaces and abstract classes in PHP, focusing on when to use each. Interfaces define a contract without implementation, suitable for unrelated classes and multiple inheritance. Abstract classes provide common funct

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.