How many bytes does a Chinese character in php have?-PHP Problem-php.cn

Home

Backend Development

PHP Problem

How many bytes does a Chinese character in php have?

藏色散人

Sep 16, 2019 am 11:26 AM

php

Introduction to

How many bytes does a Chinese character in php have?

## characters:

In js, Chinese occupies two characters and English occupies one character;

in In php, different encodings are different. In the GBK/GB2312 encoding, a Chinese character occupies 2 characters, and in UTF-8/unicode encoding, a Chinese character occupies 3 characters;

Recommended: "

PHP Tutorial 》

php bytes and characters

In php, under UTF-8 encoding, a Chinese character occupies 3 bytes, and under gbk encoding, it only occupies 2 byte.

zìfú (Character)

Characters are abstract entities that can be represented using many different character schemes or code pages. For example, Unicode UTF-16 encoding represents characters as a sequence of 16-bit integers, while Unicode UTF-8 encoding represents the same characters as a sequence of 8-bit bytes. The common language runtime uses Unicode UTF-16 (Unicode Transformation Format, a 16-bit encoding) to represent characters.

Applications targeting the common language runtime use encodings to map character table forms from the native character scheme to other schemes. Applications use decoding to map characters from non-native schemes to native schemes.

zìjié (byte)

Byte (Byte): Byte is the unit of transmitting information over the network (or storing information on the hard disk or memory).

An English letter (not case sensitive) occupies one byte of space, and a Chinese character occupies two bytes of space.

Symbols: English punctuation occupies one byte, Chinese punctuation occupies two bytes.

A binary number sequence, used as a digital unit in the computer, is generally an 8-bit binary number. For example, an ASCII code is a byte. The conversion of such units is:

The key to understanding encoding is to accurately understand the concepts of characters and bytes. These two concepts are easily confused, so we will make a distinction here:

Concept description Example

Character A mark used by people, a symbol in an abstract sense. '1', '中', 'a', '$', '￥', ……

byte is a unit for storing data in a computer, an 8-bit binary number, which is a very specific storage space. 0x01, 0x45, 0xFA, ……

ANSI

The string is in memory. If the "character" exists in ANSI encoding, one character may use one byte or multiple words. section, then we call this string an ANSI string or a multi-byte string. "Chinese 123"

(occupies 7 bytes)

UNICODE

The string is in memory. If the "character" exists as the serial number in UNICODE, then We call this kind of string a UNICODE string or a wide-byte string. L"中文123"

(occupies 10 bytes)

Since the standards specified by different ANSI encodings are different, therefore, for a given multi-byte string, we You must know which encoding rule it uses to know which "characters" it contains. For UNICODE strings, no matter what the environment, the "character" content it represents is always the same.

The above is the detailed content of How many bytes does a Chinese character in php have?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

ACID vs BASE Database: Differences and when to use each.Mar 26, 2025 pm 04:19 PM

The article compares ACID and BASE database models, detailing their characteristics and appropriate use cases. ACID prioritizes data integrity and consistency, suitable for financial and e-commerce applications, while BASE focuses on availability and

PHP Secure File Uploads: Preventing file-related vulnerabilities.Mar 26, 2025 pm 04:18 PM

The article discusses securing PHP file uploads to prevent vulnerabilities like code injection. It focuses on file type validation, secure storage, and error handling to enhance application security.

PHP Input Validation: Best practices.Mar 26, 2025 pm 04:17 PM

Article discusses best practices for PHP input validation to enhance security, focusing on techniques like using built-in functions, whitelist approach, and server-side validation.

PHP API Rate Limiting: Implementation strategies.Mar 26, 2025 pm 04:16 PM

The article discusses strategies for implementing API rate limiting in PHP, including algorithms like Token Bucket and Leaky Bucket, and using libraries like symfony/rate-limiter. It also covers monitoring, dynamically adjusting rate limits, and hand

PHP Password Hashing: password_hash and password_verify.Mar 26, 2025 pm 04:15 PM

The article discusses the benefits of using password_hash and password_verify in PHP for securing passwords. The main argument is that these functions enhance password protection through automatic salt generation, strong hashing algorithms, and secur

OWASP Top 10 PHP: Describe and mitigate common vulnerabilities.Mar 26, 2025 pm 04:13 PM

The article discusses OWASP Top 10 vulnerabilities in PHP and mitigation strategies. Key issues include injection, broken authentication, and XSS, with recommended tools for monitoring and securing PHP applications.

PHP XSS Prevention: How to protect against XSS.Mar 26, 2025 pm 04:12 PM

The article discusses strategies to prevent XSS attacks in PHP, focusing on input sanitization, output encoding, and using security-enhancing libraries and frameworks.

PHP Interface vs Abstract Class: When to use each.Mar 26, 2025 pm 04:11 PM

The article discusses the use of interfaces and abstract classes in PHP, focusing on when to use each. Interfaces define a contract without implementation, suitable for unrelated classes and multiple inheritance. Abstract classes provide common funct

See all articles