search
HomeBackend DevelopmentPHP TutorialAn in-depth explanation of the Chinese character encoding conversion method in PHP

This article introduces some knowledge about Chinese character encoding conversion in PHP and analyzes the principles and methods of PHP encoding conversion. Friends in need can refer to it.

Regarding the understanding of the mysql4.1 character set, let’s talk about how PHP adapts to this change in mysql. Also applicable to mysql5 and above versions.

1. Principles There are two concepts in the character set of MySQL, one is "character set (character set)" and the other is "collations". 1.collations Collations is translated into Chinese as "verification". In the process of web development, this vocabulary is only used in MySQL. Its main function is to guide MySQL to compare characters. For example, in the ascii character set, collations stipulates that a is less than b. a is equal to a, and whether a is equal to a and so on. Usually, you can basically ignore the existence of collations, because each character set has a default collations. Usually, you can just use the default collations. 2.Character set In contrast, character set is a broader concept. Even ordinary text files under Windows also involve character set issues. Different character sets specify different character encoding methods. A character set is a set of symbols and encodings. For example, the ASCII character set includes characters such as numbers, uppercase and lowercase letters, symbols such as semicolons and line feeds. The encoding method is to use a 7bit to represent a character ( The encoding of a is 65, and the encoding of b is 98). ASCII only stipulates the encoding of English letters. Non-English languages ​​cannot be represented by ASCII encoding. For this reason, different countries have encoded their own languages. For example, our country has gb2312 encoding. However, the encodings in each country are different, and there are also some cross-platform problems. For this reason, some international standards organizations have developed some internationally accepted encodings, and the most commonly used one is utf8. ascii only encodes English symbols and English letters, gb2312 encodes English symbols, English letters, and Chinese characters, and utf8 encodes all languages ​​​​in the world. Therefore, gb1212 characters include ascii characters, and utf8 includes gb2312 characters. It can be seen that utf8 is the character set that contains the widest range of characters. Therefore, in some multi-language web systems, the utf8 character set is generally used (phpmyadmin uses utf8 encoding). The storage of any text involves the concept of character sets. Including databases and ordinary text files.

Main terms: Characters: Chinese characters, English letters, punctuation marks, Latin, etc. Encoding: Convert characters into computer storage format, for example, a is represented by 65. Character set: A set of characters and corresponding encoding methods. a. mysql character set MySQL currently supports multiple character sets, and supports conversion between different character sets (to facilitate portability and support multi-language). MySQL can set server-level character sets, database-level character sets, data table-level character sets, and table column character sets. In fact, the final place where the character set is used is the column that stores characters. For example, you set the col1 column in table1 to be characters. Type, col1 only uses the character set. If the col2 column of the table1 table is of type int, col2 does not use the concept of the character set. Server-level character sets, database-level character sets, and data table-level character sets are all default options for column character sets. MySQL must have a character set, which can be specified by adding parameters at startup, during compilation, or in the configuration file. The mysql server character set is just a database-level default. When creating the database, you can specify the character set. If not specified, the server's character set is used. Similarly, when creating a table, you can specify the table-level character set. If not specified, the database character set is used as the table character set. When creating a column, you can specify the character set of a column. If not specified, the table's character set is used. Normally, you only need to set the server-level character set. Other database-level, table-level, and column-level character sets are inherited from the server-level character set. Since utf8 is the widest character set, under normal circumstances, we set the mysql server-level character set to utf8!

b. Character set issues for ordinary text The storage of any text has character set issues, and ordinary text files are no exception. In Windows 2000+ systems, open Notepad and in the "Save as..." dialog box, there is an option that allows you to choose the encoding method for storing text. Normally, everyone uses Windows 2000+ systems and uses the default encoding, so they will not encounter character set problems. Under Windows, you can choose the encoding method when saving a text file, but when opening a text file, the encoding method is automatically determined. There is a joke on the Internet about using Windows 2000+ Notepad to play with China Mobile and China Unicom. You can search it. The problem is caused by wrong encoding judgment when Windows opens a text file. Because automatic judgment of encoding is sometimes wrong, some text files specify how to identify the encoding used by themselves. html files are one such example. html is a text file. When storing html files, you need to use an encoding, and in html files, html syntax is also used to specify the encoding used by the file (for example). If the html file does not specify an encoding, the browser automatically identifies the encoding of the file. If html specifies an encoding, the browser uses the encoding specified by html. Normally, the charset specified in the HTML file is consistent with the encoding of the HTML file itself, but there are also cases of inconsistency. If they are inconsistent, the web page will be garbled (the garbled code here is only related to the text file and has nothing to do with the database.) Use Specialized web page editing tools (such as dreamwave) will automatically encode files based on the charset value in the web page.

c. Character set problem of php+mysql What PHP ultimately generates is a text file, but it needs to retrieve the text from the database or store the text into the database. Since MySQL supports multiple character sets, by default, MySQL does not know what coded characters PHP sends to it. Therefore, MySQL requires the client (php) to tell it what character set it accesses. By setting character_set_client, php tells mysql what encoding method php stores in the database. By setting character_set_results, php tells mysql what kind of encoded data php needs to get. By setting character_set_connection, php tells mysql what encoding to use for the text in the php query. mysql uses the set encoding to store text. Assume that MySQL uses setserver to store text, PHP's character_set_client is setclient, and PHP's character_set_results is setresult. Then, mysql converts the text sent from php from the setclient encoding method to the setserver encoding method, and then stores it in the database. If php retrieves the text, mysql converts the text from setserver to setresult, and then sends it to php. The php file (the final generated html file) itself has a code. If the code passed by mysql is different from the code of the php file itself, then the entire web page will be garbled. Therefore, PHP generally tells MySQL its own encoding method. To ensure that there is no garbled code, it is necessary to unify three codes: one is the code of the web page itself, the other is the code specified in HTML, and the third is the code that PHP tells mysql (including character_set_client and character_set_results). The first and second codes are usually consistent if you use an editor such as dw to write a web page, but they may be inconsistent if you use a notepad to write a web page. The third encoding requires manual notification to mysql. This step can be achieved by using mysql_query("set names characterx") in php.

d.Character set conversion problem If a small character set is converted to a large character set, data will not be lost, but if a large character set is converted to a small character set, data may be lost. For example, some characters in utf8 may not be present in gb2312, so some characters may be lost when converting from utf8 to gb2312. But there is an exception. First convert from gb2312 to utf8, and then convert from utf8 to gb2312. In this case, no data will be lost, because the text converted at the beginning is all characters in gb2312, so the whole process is It is the characters of gb2312 that are being converted and will not be lost. Because utf8 can accommodate all characters in the world, databases generally use utf8 encoding. This allows any character to be stored in the UTF8-encoded database.

e. phpmyadmin garbled problem phpmyadmin supports multiple languages, which must require the html page to use utf8 encoding. The html page uses utf8 encoding, which requires phpmyadmin to use utf8 encoding for character_set_client and character_set_results when connecting to mysql. Under the current circumstances, PHP can only use set names (or several other statements) to notify MySQL of the encoding method when connecting to MySQL. If there is no explicit encoding method declared, latin1 encoding will be used. General programs do not explicitly declare the character_set_client variable, so the gb2312 text is stored in the database in latin1 encoding, and phpmyadmin reads it in utf8 format, which will definitely be garbled. If the PHP program is stored in the database with the correct encoding, there will definitely be no problem. Therefore, it is not phpmyadmin that needs to be modified. (Although sometimes modifying phpmyadmin can solve the garbled problem, this is not the root of the problem)

Two. Summary

1. Try to use utf8 storage for the database (modify /etc/my.cnf and add default-character-set=utf8 to the [mysqld] section) (Existing database, first convert to utf8 format) 2. Before querying the database, the PHP program executes mysql_query("set names xxxx"); where xxxx is the encoding of your web page (charset=xxxx). If charset=utf8 in the web page, then xxxx=utf8, if charset=gb2312 in the web page , then xxxx=gb2312, if the charset=ipaddr in the web page, then xxxx=ipaddr (just kidding, there is no such encoding) Almost all web programs have a common code for connecting to the database, which is placed in a file. In this file, just add mysql_query ("set names"). 3.phpmyadmin does not need to be changed. 4. Note that in order to ensure that the actual encoding of the web page (the encoding in the Windows save dialog box) is consistent with its declared encoding (charset=?), please use tools such as dw to create the web page.



Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Working with Flash Session Data in LaravelWorking with Flash Session Data in LaravelMar 12, 2025 pm 05:08 PM

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

cURL in PHP: How to Use the PHP cURL Extension in REST APIscURL in PHP: How to Use the PHP cURL Extension in REST APIsMar 14, 2025 am 11:42 AM

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Build a React App With a Laravel Back End: Part 2, ReactBuild a React App With a Laravel Back End: Part 2, ReactMar 04, 2025 am 09:33 AM

This is the second and final part of the series on building a React application with a Laravel back-end. In the first part of the series, we created a RESTful API using Laravel for a basic product-listing application. In this tutorial, we will be dev

Simplified HTTP Response Mocking in Laravel TestsSimplified HTTP Response Mocking in Laravel TestsMar 12, 2025 pm 05:09 PM

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

12 Best PHP Chat Scripts on CodeCanyon12 Best PHP Chat Scripts on CodeCanyonMar 13, 2025 pm 12:08 PM

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Notifications in LaravelNotifications in LaravelMar 04, 2025 am 09:22 AM

In this article, we're going to explore the notification system in the Laravel web framework. The notification system in Laravel allows you to send notifications to users over different channels. Today, we'll discuss how you can send notifications ov

Explain the concept of late static binding in PHP.Explain the concept of late static binding in PHP.Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

PHP Logging: Best Practices for PHP Log AnalysisPHP Logging: Best Practices for PHP Log AnalysisMar 10, 2025 pm 02:32 PM

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment