Home  >  Article  >  Backend Development  >  How to use file_get_contents in PHP to crawl Chinese garbled web pages, _PHP tutorial

How to use file_get_contents in PHP to crawl Chinese garbled web pages, _PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:11:29921browse

How to use file_get_contents in PHP to crawl Chinese garbled web pages,

The example in this article describes how to use file_get_contents in PHP to crawl Chinese garbled web pages. Share it with everyone for your reference. The specific method is as follows:

The file_get_contents function is originally a very excellent local and remote file operation function that comes with PHP. It allows us to download remote data directly without any effort, but I encountered some problems when using it to read web pages. The page is garbled. Here we will summarize the specific solutions for you.

According to friends on the Internet, the reason may be that the server has turned on GZIP compression. The following is to use firebug to check the header information of my website. Gzip is turned on, and the original header information of the request header information is as follows:

Copy code The code is as follows:
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q =0.8
Accept-Encoding gzip, deflate
Accept-Language zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3
Connection keep-alive
Cookie __utma=225240837.787252530.1317310581.1335406161.1335411401.1537; __utmz=225240837.1326850415.887.3.utmcsr=google|utmccn=(organic)|utmcmd=organic| utmctr=%E4%BB%BB%E4%BD%95%E9%A1%B9% E7%9B%AE%E9%83%BD%E4%B8%8D%E4%BC%9A%E9%82%A3%E4%B9%88%E7%AE%80%E5%8D%95%20site% 3Awww.nowamagic.net; PHPSESSID=888mj4425p8s0m7s0frre3ovc7; __utmc=225240837; __utmb=225240837.1.10.1335411401
Host www.jb51.net
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0

The Content-Encoding item can be found from the header information and is Gzip.

The solution is relatively simple, which is to use curl instead of file_get_contents to obtain, and then add one to the curl configuration parameters. The code is as follows:

Copy code The code is as follows:
curl_setopt($ch, CURLOPT_ENCODING, "gzip");

When I used file_get_contents to capture pictures today, I didn’t notice this problem at first, and it took a lot of effort to find it out.

Use the built-in zlib library. If the server has installed the zlib library, you can easily solve the garbled code problem by using the following code:

Copy code The code is as follows:
$data = file_get_contents("compress.zlib://".$url);

I hope this article will be helpful to everyone’s PHP programming design.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/929093.htmlTechArticleSolution to the problem of using file_get_contents to crawl Chinese garbled web pages in PHP. This article describes the use of file_get_contents in PHP to crawl web pages. Solution to the problem of Chinese garbled characters. Share with everyone...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn