采集一个网站的数据时,返回的是以chunked编码,gzip压缩的文档,该网站的服务器显示是IIS,。。。
解码chunked没问题,但是解压gzip压缩文档时,偶尔会失败,这样就影响我提取下一组请求连接了。。。
解压10组左右,就会出现解压失败的情况。。
这是解压前的数据:
解压后的数据:
显然在最后一组,解压失败了。。
这是尝试用过的三组方法:
private function _deCompressData() { if($this->is_gzip) { $this->response_body = gzinflate(substr($this->response_body,10)); // // if($temp = gzdecode($this->response_body)) {// $this->response_body = $temp;// } else {// $this->response_body = $this->mygzdecode($this->response_body);// } //$this->response_body = $this->mygzdecode($this->response_body); // $this->response_body = gzdecode($this->response_body); } }
mygzdecode函数是这一个
/** * @desc 自定义解压函数 */ function mygzdecode($data, &$filename = '', &$error = '', $maxlength = null) { $len = strlen($data); if ($len < 18 || strcmp(substr($data, 0, 2), "\x1f\x8b")) { $error = "Not in GZIP format."; return null; // Not GZIP format (See RFC 1952) } $method = ord(substr($data, 2, 1)); // Compression method $flags = ord(substr($data, 3, 1)); // Flags if ($flags & 31 != $flags) { $error = "Reserved bits not allowed."; return null; } // NOTE: $mtime may be negative (PHP integer limitations) $mtime = unpack("V", substr($data, 4, 4)); $mtime = $mtime[1]; $xfl = substr($data, 8, 1); $os = substr($data, 8, 1); $headerlen = 10; $extralen = 0; $extra = ""; if ($flags & 4) { // 2-byte length prefixed EXTRA data in header if ($len - $headerlen - 2 < 8) { return false; // invalid } $extralen = unpack("v", substr($data, 8, 2)); $extralen = $extralen[1]; if ($len - $headerlen - 2 - $extralen < 8) { return false; // invalid } $extra = substr($data, 10, $extralen); $headerlen += 2 + $extralen; } $filenamelen = 0; $filename = ""; if ($flags & 8) { // C-style string if ($len - $headerlen - 1 < 8) { return false; // invalid } $filenamelen = strpos(substr($data, $headerlen), chr(0)); if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // invalid } $filename = substr($data, $headerlen, $filenamelen); $headerlen += $filenamelen + 1; } $commentlen = 0; $comment = ""; if ($flags & 16) { // C-style string COMMENT data in header if ($len - $headerlen - 1 < 8) { return false; // invalid } $commentlen = strpos(substr($data, $headerlen), chr(0)); if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format } $comment = substr($data, $headerlen, $commentlen); $headerlen += $commentlen + 1; } $headercrc = ""; if ($flags & 2) { // 2-bytes (lowest order) of CRC32 on header present if ($len - $headerlen - 2 < 8) { return false; // invalid } $calccrc = crc32(substr($data, 0, $headerlen)) & 0xffff; $headercrc = unpack("v", substr($data, $headerlen, 2)); $headercrc = $headercrc[1]; if ($headercrc != $calccrc) { $error = "Header checksum failed."; return false; // Bad header CRC } $headerlen += 2; } // GZIP FOOTER $datacrc = unpack("V", substr($data, -8, 4)); $datacrc = sprintf('%u', $datacrc[1] & 0xFFFFFFFF); $isize = unpack("V", substr($data, -4)); $isize = $isize[1]; // decompression: $bodylen = $len - $headerlen - 8; if ($bodylen < 1) { // IMPLEMENTATION BUG! return null; } $body = substr($data, $headerlen, $bodylen); $data = ""; if ($bodylen > 0) { switch ($method) { case 8: // Currently the only supported compression method: $data = gzinflate($body, $maxlength); break; default: $error = "Unknown compression method."; return false; } } // zero-byte body content is allowed // Verifiy CRC32 $crc = sprintf("%u", crc32($data)); $crcOK = $crc == $datacrc; $lenOK = $isize == strlen($data); if (!$lenOK || !$crcOK) { $error = ( $lenOK ? '' : 'Length check FAILED. ') . ( $crcOK ? '' : 'Checksum FAILED.'); return false; } return $data; }
也就是说,连续解压时,会出现解压失败的情况
回复讨论(解决方案)
php 已经提供了 gzdecode 函数
如果你的 php 版本实在很低,没有 gzdecode 函数
那么 php 代码级的 gzdecode 函数是
function gzdecode($data) { $len = strlen($data); if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { return $data; // Not GZIP format (See RFC 1952) } $method = ord(substr($data,2,1)); // Compression method $flags = ord(substr($data,3,1)); // Flags if ($flags & 31 != $flags) { // Reserved bits are set -- NOT ALLOWED by RFC 1952 return data; } // NOTE: $mtime may be negative (PHP integer limitations) $mtime = unpack("V", substr($data,4,4)); $mtime = $mtime[1]; $xfl = substr($data,8,1); $os = substr($data,8,1); $headerlen = 10; $extralen = 0; $extra = ""; if ($flags & 4) { // 2-byte length prefixed EXTRA data in header if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $extralen = unpack("v",substr($data,8,2)); $extralen = $extralen[1]; if ($len - $headerlen - 2 - $extralen < 8) { return false; // Invalid format } $extra = substr($data,10,$extralen); $headerlen += 2 + $extralen; } $filenamelen = 0; $filename = ""; if ($flags & 8) { // C-style string file NAME data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $filenamelen = strpos(substr($data,8+$extralen),chr(0)); if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // Invalid format } $filename = substr($data,$headerlen,$filenamelen); $headerlen += $filenamelen + 1; } $commentlen = 0; $comment = ""; if ($flags & 16) { // C-style string COMMENT data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format } $comment = substr($data,$headerlen,$commentlen); $headerlen += $commentlen + 1; } $headercrc = ""; if ($flags & 1) { // 2-bytes (lowest order) of CRC32 on header present if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; $headercrc = unpack("v", substr($data,$headerlen,2)); $headercrc = $headercrc[1]; if ($headercrc != $calccrc) { return false; // Bad header CRC } $headerlen += 2; } // GZIP FOOTER - These be negative due to PHP's limitations $datacrc = unpack("V",substr($data,-8,4)); $datacrc = $datacrc[1]; $isize = unpack("V",substr($data,-4)); $isize = $isize[1]; // Perform the decompression: $bodylen = $len-$headerlen-8; if ($bodylen < 1) { // This should never happen - IMPLEMENTATION BUG! return null; } $body = substr($data,$headerlen,$bodylen); $data = ""; if ($bodylen > 0) { switch ($method) { case 8: // Currently the only supported compression method: $data = gzinflate($body); break; default: // Unknown compression method return false; } } else { // I'm not sure if zero-byte body content is allowed. // Allow it for now... Do nothing... } // Verifiy decompressed size and CRC32: // NOTE: This may fail with large data sizes depending on how // PHP's integer limitations affect strlen() since $isize // may be negative for large sizes. if ($isize != strlen($data) || crc32($data) != $datacrc) { // Bad format! Length or CRC doesn't match! return false; } return $data; }
自己对比一下,看看是否是你抄写错了
既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作
php 已经提供了 gzdecode 函数
如果你的 php 版本实在很低,没有 gzdecode 函数
那么 php 代码级的 gzdecode 函数是
function gzdecode($data) { $len = strlen($data); if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { return $data; // Not GZIP format (See RFC 1952) } $method = ord(substr($data,2,1)); // Compression method $flags = ord(substr($data,3,1)); // Flags if ($flags & 31 != $flags) { // Reserved bits are set -- NOT ALLOWED by RFC 1952 return data; } // NOTE: $mtime may be negative (PHP integer limitations) $mtime = unpack("V", substr($data,4,4)); $mtime = $mtime[1]; $xfl = substr($data,8,1); $os = substr($data,8,1); $headerlen = 10; $extralen = 0; $extra = ""; if ($flags & 4) { // 2-byte length prefixed EXTRA data in header if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $extralen = unpack("v",substr($data,8,2)); $extralen = $extralen[1]; if ($len - $headerlen - 2 - $extralen < 8) { return false; // Invalid format } $extra = substr($data,10,$extralen); $headerlen += 2 + $extralen; } $filenamelen = 0; $filename = ""; if ($flags & 8) { // C-style string file NAME data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $filenamelen = strpos(substr($data,8+$extralen),chr(0)); if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // Invalid format } $filename = substr($data,$headerlen,$filenamelen); $headerlen += $filenamelen + 1; } $commentlen = 0; $comment = ""; if ($flags & 16) { // C-style string COMMENT data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format } $comment = substr($data,$headerlen,$commentlen); $headerlen += $commentlen + 1; } $headercrc = ""; if ($flags & 1) { // 2-bytes (lowest order) of CRC32 on header present if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; $headercrc = unpack("v", substr($data,$headerlen,2)); $headercrc = $headercrc[1]; if ($headercrc != $calccrc) { return false; // Bad header CRC } $headerlen += 2; } // GZIP FOOTER - These be negative due to PHP's limitations $datacrc = unpack("V",substr($data,-8,4)); $datacrc = $datacrc[1]; $isize = unpack("V",substr($data,-4)); $isize = $isize[1]; // Perform the decompression: $bodylen = $len-$headerlen-8; if ($bodylen < 1) { // This should never happen - IMPLEMENTATION BUG! return null; } $body = substr($data,$headerlen,$bodylen); $data = ""; if ($bodylen > 0) { switch ($method) { case 8: // Currently the only supported compression method: $data = gzinflate($body); break; default: // Unknown compression method return false; } } else { // I'm not sure if zero-byte body content is allowed. // Allow it for now... Do nothing... } // Verifiy decompressed size and CRC32: // NOTE: This may fail with large data sizes depending on how // PHP's integer limitations affect strlen() since $isize // may be negative for large sizes. if ($isize != strlen($data) || crc32($data) != $datacrc) { // Bad format! Length or CRC doesn't match! return false; } return $data; }
我的是PHP 5.6 ,
gzinflate(substr($this->response_body,10));
gzdecode($this->response_body)
mygzdecode($this->response_body);
这三种方法都可以用,但都遇到同一个问题,连续解压时,会出现解压失败的问题。
大婶,新年快乐哈
自己对比一下,看看是否是你抄写错了
既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作
好的。
在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了
主要是你要有容错策略
自己对比一下,看看是否是你抄写错了
既然函数会在 传入长度 和 crc32 校验失败时返回假,那么你就应该判断一下再进行下一步工作
// Verifiy CRC32
$crc = sprintf("%u", crc32($data));
$crcOK = $crc == $datacrc;
$lenOK = $isize == strlen($data);
if (!$lenOK || !$crcOK) {
$this->status = ( $lenOK ? '' : 'Length check FAILED. ') . ( $crcOK ? '' : 'Checksum FAILED.');
return false;
}
return $data;
检测出来了,是这里校验失败了。。。
对链接http://www.cnu.cc/works/111706发起请求
Length check FAILED. Checksum FAILED.
在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了
主要是你要有容错策略
对。。。 这个地方确实需要加强。。。只做了重置连接,没有对收到数据的完整性做校验。。
在网络上传输的数据,出现错误是不可避免的,但概率不高
重读一下,通常就可以了
主要是你要有容错策略
OK了,连续采集10分钟,没出问题 。。。THX,,摸摸大
传输过程出问题,导致部分数据没有了,而解压失败。
把需要解压的文件加入解压列表,每隔5秒-10秒判断解压文件是否变化,如无变化,则解压,解压失败做标记,继续下一个解压。
传输过程出问题,导致部分数据没有了,而解压失败。

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP remains important in the modernization process because it supports a large number of websites and applications and adapts to development needs through frameworks. 1.PHP7 improves performance and introduces new features. 2. Modern frameworks such as Laravel, Symfony and CodeIgniter simplify development and improve code quality. 3. Performance optimization and best practices further improve application efficiency.

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

In PHP, use the clone keyword to create a copy of the object and customize the cloning behavior through the \_\_clone magic method. 1. Use the clone keyword to make a shallow copy, cloning the object's properties but not the object's properties. 2. The \_\_clone method can deeply copy nested objects to avoid shallow copying problems. 3. Pay attention to avoid circular references and performance problems in cloning, and optimize cloning operations to improve efficiency.

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use