An in-depth explanation of the Chinese character encoding conversion method in PHP-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

An in-depth explanation of the Chinese character encoding conversion method in PHP

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 25, 2016 am 08:53 AM

This article introduces some knowledge about Chinese character encoding conversion in PHP and analyzes the principles and methods of PHP encoding conversion. Friends in need can refer to it.

Regarding the understanding of the mysql4.1 character set, let’s talk about how PHP adapts to this change in mysql. Also applicable to mysql5 and above versions.

1. Principles There are two concepts in the character set of MySQL, one is "character set (character set)" and the other is "collations". 1.collations Collations is translated into Chinese as "verification". In the process of web development, this vocabulary is only used in MySQL. Its main function is to guide MySQL to compare characters. For example, in the ascii character set, collations stipulates that a is less than b. a is equal to a, and whether a is equal to a and so on. Usually, you can basically ignore the existence of collations, because each character set has a default collations. Usually, you can just use the default collations. 2.Character set In contrast, character set is a broader concept. Even ordinary text files under Windows also involve character set issues. Different character sets specify different character encoding methods. A character set is a set of symbols and encodings. For example, the ASCII character set includes characters such as numbers, uppercase and lowercase letters, symbols such as semicolons and line feeds. The encoding method is to use a 7bit to represent a character ( The encoding of a is 65, and the encoding of b is 98). ASCII only stipulates the encoding of English letters. Non-English languages cannot be represented by ASCII encoding. For this reason, different countries have encoded their own languages. For example, our country has gb2312 encoding. However, the encodings in each country are different, and there are also some cross-platform problems. For this reason, some international standards organizations have developed some internationally accepted encodings, and the most commonly used one is utf8. ascii only encodes English symbols and English letters, gb2312 encodes English symbols, English letters, and Chinese characters, and utf8 encodes all languages in the world. Therefore, gb1212 characters include ascii characters, and utf8 includes gb2312 characters. It can be seen that utf8 is the character set that contains the widest range of characters. Therefore, in some multi-language web systems, the utf8 character set is generally used (phpmyadmin uses utf8 encoding). The storage of any text involves the concept of character sets. Including databases and ordinary text files.

Main terms: Characters: Chinese characters, English letters, punctuation marks, Latin, etc. Encoding: Convert characters into computer storage format, for example, a is represented by 65. Character set: A set of characters and corresponding encoding methods. a. mysql character set MySQL currently supports multiple character sets, and supports conversion between different character sets (to facilitate portability and support multi-language). MySQL can set server-level character sets, database-level character sets, data table-level character sets, and table column character sets. In fact, the final place where the character set is used is the column that stores characters. For example, you set the col1 column in table1 to be characters. Type, col1 only uses the character set. If the col2 column of the table1 table is of type int, col2 does not use the concept of the character set. Server-level character sets, database-level character sets, and data table-level character sets are all default options for column character sets. MySQL must have a character set, which can be specified by adding parameters at startup, during compilation, or in the configuration file. The mysql server character set is just a database-level default. When creating the database, you can specify the character set. If not specified, the server's character set is used. Similarly, when creating a table, you can specify the table-level character set. If not specified, the database character set is used as the table character set. When creating a column, you can specify the character set of a column. If not specified, the table's character set is used. Normally, you only need to set the server-level character set. Other database-level, table-level, and column-level character sets are inherited from the server-level character set. Since utf8 is the widest character set, under normal circumstances, we set the mysql server-level character set to utf8!

b. Character set issues for ordinary text The storage of any text has character set issues, and ordinary text files are no exception. In Windows 2000+ systems, open Notepad and in the "Save as..." dialog box, there is an option that allows you to choose the encoding method for storing text. Normally, everyone uses Windows 2000+ systems and uses the default encoding, so they will not encounter character set problems. Under Windows, you can choose the encoding method when saving a text file, but when opening a text file, the encoding method is automatically determined. There is a joke on the Internet about using Windows 2000+ Notepad to play with China Mobile and China Unicom. You can search it. The problem is caused by wrong encoding judgment when Windows opens a text file. Because automatic judgment of encoding is sometimes wrong, some text files specify how to identify the encoding used by themselves. html files are one such example. html is a text file. When storing html files, you need to use an encoding, and in html files, html syntax is also used to specify the encoding used by the file (for example). If the html file does not specify an encoding, the browser automatically identifies the encoding of the file. If html specifies an encoding, the browser uses the encoding specified by html. Normally, the charset specified in the HTML file is consistent with the encoding of the HTML file itself, but there are also cases of inconsistency. If they are inconsistent, the web page will be garbled (the garbled code here is only related to the text file and has nothing to do with the database.) Use Specialized web page editing tools (such as dreamwave) will automatically encode files based on the charset value in the web page.

c. Character set problem of php+mysql What PHP ultimately generates is a text file, but it needs to retrieve the text from the database or store the text into the database. Since MySQL supports multiple character sets, by default, MySQL does not know what coded characters PHP sends to it. Therefore, MySQL requires the client (php) to tell it what character set it accesses. By setting character_set_client, php tells mysql what encoding method php stores in the database. By setting character_set_results, php tells mysql what kind of encoded data php needs to get. By setting character_set_connection, php tells mysql what encoding to use for the text in the php query. mysql uses the set encoding to store text. Assume that MySQL uses setserver to store text, PHP's character_set_client is setclient, and PHP's character_set_results is setresult. Then, mysql converts the text sent from php from the setclient encoding method to the setserver encoding method, and then stores it in the database. If php retrieves the text, mysql converts the text from setserver to setresult, and then sends it to php. The php file (the final generated html file) itself has a code. If the code passed by mysql is different from the code of the php file itself, then the entire web page will be garbled. Therefore, PHP generally tells MySQL its own encoding method. To ensure that there is no garbled code, it is necessary to unify three codes: one is the code of the web page itself, the other is the code specified in HTML, and the third is the code that PHP tells mysql (including character_set_client and character_set_results). The first and second codes are usually consistent if you use an editor such as dw to write a web page, but they may be inconsistent if you use a notepad to write a web page. The third encoding requires manual notification to mysql. This step can be achieved by using mysql_query("set names characterx") in php.

d.Character set conversion problem If a small character set is converted to a large character set, data will not be lost, but if a large character set is converted to a small character set, data may be lost. For example, some characters in utf8 may not be present in gb2312, so some characters may be lost when converting from utf8 to gb2312. But there is an exception. First convert from gb2312 to utf8, and then convert from utf8 to gb2312. In this case, no data will be lost, because the text converted at the beginning is all characters in gb2312, so the whole process is It is the characters of gb2312 that are being converted and will not be lost. Because utf8 can accommodate all characters in the world, databases generally use utf8 encoding. This allows any character to be stored in the UTF8-encoded database.

e. phpmyadmin garbled problem phpmyadmin supports multiple languages, which must require the html page to use utf8 encoding. The html page uses utf8 encoding, which requires phpmyadmin to use utf8 encoding for character_set_client and character_set_results when connecting to mysql. Under the current circumstances, PHP can only use set names (or several other statements) to notify MySQL of the encoding method when connecting to MySQL. If there is no explicit encoding method declared, latin1 encoding will be used. General programs do not explicitly declare the character_set_client variable, so the gb2312 text is stored in the database in latin1 encoding, and phpmyadmin reads it in utf8 format, which will definitely be garbled. If the PHP program is stored in the database with the correct encoding, there will definitely be no problem. Therefore, it is not phpmyadmin that needs to be modified. (Although sometimes modifying phpmyadmin can solve the garbled problem, this is not the root of the problem)

Two. Summary

1. Try to use utf8 storage for the database (modify /etc/my.cnf and add default-character-set=utf8 to the [mysqld] section) (Existing database, first convert to utf8 format) 2. Before querying the database, the PHP program executes mysql_query("set names xxxx"); where xxxx is the encoding of your web page (charset=xxxx). If charset=utf8 in the web page, then xxxx=utf8, if charset=gb2312 in the web page , then xxxx=gb2312, if the charset=ipaddr in the web page, then xxxx=ipaddr (just kidding, there is no such encoding) Almost all web programs have a common code for connecting to the database, which is placed in a file. In this file, just add mysql_query ("set names"). 3.phpmyadmin does not need to be changed. 4. Note that in order to ensure that the actual encoding of the web page (the encoding in the Windows save dialog box) is consistent with its declared encoding (charset=?), please use tools such as dw to create the web page.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the concept of a PHP session in simple terms.Apr 26, 2025 am 12:09 AM

PHPsessionstrackuserdataacrossmultiplepagerequestsusingauniqueIDstoredinacookie.Here'showtomanagethemeffectively:1)Startasessionwithsession_start()andstoredatain$_SESSION.2)RegeneratethesessionIDafterloginwithsession_regenerate_id(true)topreventsessi

How do you loop through all the values stored in a PHP session?Apr 26, 2025 am 12:06 AM

In PHP, iterating through session data can be achieved through the following steps: 1. Start the session using session_start(). 2. Iterate through foreach loop through all key-value pairs in the $_SESSION array. 3. When processing complex data structures, use is_array() or is_object() functions and use print_r() to output detailed information. 4. When optimizing traversal, paging can be used to avoid processing large amounts of data at one time. This will help you manage and use PHP session data more efficiently in your actual project.

Explain how to use sessions for user authentication.Apr 26, 2025 am 12:04 AM

The session realizes user authentication through the server-side state management mechanism. 1) Session creation and generation of unique IDs, 2) IDs are passed through cookies, 3) Server stores and accesses session data through IDs, 4) User authentication and status management are realized, improving application security and user experience.

Give an example of how to store a user's name in a PHP session.Apr 26, 2025 am 12:03 AM

Tostoreauser'snameinaPHPsession,startthesessionwithsession_start(),thenassignthenameto$_SESSION['username'].1)Usesession_start()toinitializethesession.2)Assigntheuser'snameto$_SESSION['username'].Thisallowsyoutoaccessthenameacrossmultiplepages,enhanc

What are some common problems that can cause PHP sessions to fail?Apr 25, 2025 am 12:16 AM

Reasons for PHPSession failure include configuration errors, cookie issues, and session expiration. 1. Configuration error: Check and set the correct session.save_path. 2.Cookie problem: Make sure the cookie is set correctly. 3.Session expires: Adjust session.gc_maxlifetime value to extend session time.

How do you debug session-related issues in PHP?Apr 25, 2025 am 12:12 AM

Methods to debug session problems in PHP include: 1. Check whether the session is started correctly; 2. Verify the delivery of the session ID; 3. Check the storage and reading of session data; 4. Check the server configuration. By outputting session ID and data, viewing session file content, etc., you can effectively diagnose and solve session-related problems.

What happens if session_start() is called multiple times?Apr 25, 2025 am 12:06 AM

Multiple calls to session_start() will result in warning messages and possible data overwrites. 1) PHP will issue a warning, prompting that the session has been started. 2) It may cause unexpected overwriting of session data. 3) Use session_status() to check the session status to avoid repeated calls.

How do you configure the session lifetime in PHP?Apr 25, 2025 am 12:05 AM

Configuring the session lifecycle in PHP can be achieved by setting session.gc_maxlifetime and session.cookie_lifetime. 1) session.gc_maxlifetime controls the survival time of server-side session data, 2) session.cookie_lifetime controls the life cycle of client cookies. When set to 0, the cookie expires when the browser is closed.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Linux new version

SublimeText3 Linux latest version

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver Mac version

Visual web development tools

Hot Topics

Where is the login entrance for gmail email?

7725

1643

1397

1290

1233