Home  >  Article  >  Backend Development  >  Use PHP to determine whether a web page is gzip compressed_PHP tutorial

Use PHP to determine whether a web page is gzip compressed_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:03:26840browse

Last night, when a friend in the group collected web pages, they found that the web pages obtained by file_get_contents were garbled when saved locally. The response header contained Content-Encoding: gzip
, but it looked normal in the browser.
Because I have had relevant experience, I immediately discovered that the website turned on gzip and file_get_contents obtained the compressed page instead of the decompressed page (I don’t know if file_get_contents should be brought when requesting the web page. Corresponding parameters, directly obtain the web page that has not been compressed by gzip? )
I just saw not long ago that the file type can be determined by reading the first 2 bytes of the file. Friends in the group also said that the first 2 bytes of a gzip-compressed web page (gbk encoded) are 1F 8B, so you can determine whether the web page has been gzip-compressed.
The code is as follows:

Copy the code The code is as follows:

//Mire Military Network uses gzip to compress web pages
//file_get_contents The web page obtained directly is garbled.
header('Content-Type:text/html;charset=utf-8' );
$url = 'http://www.miercn.com';
$file = fopen($url , "rb");
//Read only 2 bytes If it is (hexadecimal) 1f 8b (decimal) 31 139, gzip is enabled;
$bin = fread($file, 2) ;
fclose($file);
$strInfo = @unpack("C2chars", $bin);
$typeCode = intval($strInfo['chars1'].$strInfo['chars2'] );
$isGzip = 0;
switch ($typeCode)
{
case 31139: ;
default:
$isGzip = 0;
}
$url = $isGzip ? "compress.zlib://".$url:$url; // ternary expression
$mierHtml = file_get_contents($url); //Get Mier Military Network data
$mierHtml = iconv("gbk","utf-8",$mierHtml);
echo $mierHtml;




http://www.bkjia.com/PHPjc/327839.html

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/327839.htmlTechArticleLast night when a friend in the group collected web pages, they found that the web pages obtained by file_get_contents were saved locally as garbled characters, and the response headers were Content-Encoding:gzip But it looks normal in the browser. ...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn