Home  >  Article  >  Backend Development  >  php curl gzip garbled characters

php curl gzip garbled characters

WBOY
WBOYOriginal
2023-05-06 21:21:091101browse

In recent years, with the development of the Internet, the use of Web services has increased. Among them, PHP, as a popular open source programming language, is widely used in the field of Web development and has become the most frequently used language. In PHP development, it is a common practice to use the cURL library for sending HTTP requests and receiving HTTP responses. At the same time, for most HTTP responses, the server will enable the gzip compression algorithm to compress the response, thereby saving network transmission bandwidth and time and improving the performance of web applications. However, when using PHP curl to make an HTTP request, if the returned response body is gzip compressed, garbled characters may appear. This article will focus on how to solve the problem of garbled characters returned by gzip in PHP curl.

1. Gzip compression algorithm

The gzip compression algorithm is a lossless compression algorithm that is often used to compress Web resource files such as HTML, CSS, and JavaScript files. It compresses these files when they are stored and transmitted, thereby eliminating a large amount of redundant data during file transmission on the Web and reducing transmission time and bandwidth requirements. Both web browsers and servers support gzip compression. This is because gzip has now become a standard for the HTTP/1.1 protocol, which greatly improves the performance of web applications.

The principle of the gzip compression algorithm is to use Huffman coding to convert each file into the corresponding binary encoding when compressing it individually. Huffman coding is a variable-length coding that uses different coding tables according to different compression objects. For character sequences of the same length, compression using Huffman encoding requires less storage space than encoding using a fixed word length, which is one of the reasons why the gzip compression algorithm is efficient. In the compressed file, except for the first byte as the identifier, the remaining bytes are generated through Huffman encoding.

II. Gzip compression in PHP curl

PHP provides the curl library for network communication. Use the curl library to send HTTP requests and receive HTTP responses. When sending an HTTP request, you need to configure the relevant parameters of the curl library, such as the requested URL, request method, and request headers. When receiving an HTTP response, the curl library will automatically decompress the response body in gzip format by default. Usually, PHP developers can call the curl_setopt() function to configure the parameters of the request and set CURLOPT_ENCODING to gzip to support gzip format responses through curl.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://example.com/path/to/api");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
$response = curl_exec($ch);
curl_close($ch );
echo $response;

III. Problems encountered in PHP curl

In the use of the PHP curl library, we often encounter certain HTTP responses that are compressed using gzip , the problem of Chinese garbled characters still appears after decompression. The reason for this kind of Chinese garbled characters is that there are Unicode-encoded characters in the files compressed by the server, and the curl library does not correctly handle these Unicode-encoded characters by default, causing errors in the process of decompressing and restoring the compressed files.

For example, the following is the response to an HTTP request:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 135

H4sIAAAAAAAAAG3QwQ2DMBDH8fc8D2l1p7hAUEiyklH6eAINu6cZm3jKDyL4ItZAN1RSxVVWUKpN
wIU8qf1Jc4S2uK4Wq5674tLLasa5B tU4mSivZkR5tb6637HP NzJjvY Xt1vVy5Pz5v9h D7mJj
nTfBsGsqFQAAA==

It seems that this response header and There is no problem with the response body, but after using the above PHP curl code to make a request, we will get the following response:

�j\ko?t[��_mK”�Ix۱�E�U� c��">W��6

This response body contains garbled characters. This is because the response body uses gzip compression and it contains UTF-8 encoded characters. In order to correctly decompress and restore this response body, some configuration is required in the PHP curl library.

IV. Solution

1. Use gzdecode to decompress

The PHP function gzdecode() is a method of decompressing gzip format files. It can be used to directly return curl requests. Decompress the gzip file to get the correct file.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://example.com/path/to/api");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
$response = curl_exec($ch);
curl_close($ch );
echo gzdecode($response);

2. Use iconv to transcode

Another way to solve the problem of Chinese garbled characters is to use PHP's iconv function. Before the function is executed, you need to intercept the space occupied by the first two bytes of the compressed file by calling the string function substr(), and then use the iconv function to convert the string to obtain the correct Chinese characters.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://example.com/path/to/api");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
$response = curl_exec($ch);
curl_close($ch);
$response = substr($response, 10); // Remove the first 10 bytes of the file compressed using gzip format
$response = iconv('UTF-8', 'GBK//IGNORE', $response) ;
echo $response;

Summary

When using the PHP curl library to perform HTTP requests and responses, you should note that the server may enable the Gzip compression algorithm to reduce the amount of data transmission. , improve the performance of web applications. However, when using the PHP curl library, you may encounter the problem of Chinese garbled characters, which should be solved in time. There are many ways to solve Chinese garbled characters, such as using gzdecode to decompress, using the iconv function to transcode, etc. In actual development, you can choose the appropriate method to solve the problem of Chinese garbled characters according to your own needs and actual conditions.

The above is the detailed content of php curl gzip garbled characters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn