Home  >  Article  >  Backend Development  >  curl_init回来的编码如何判断?个别文字乱码求解

curl_init回来的编码如何判断?个别文字乱码求解

WBOY
WBOYOriginal
2016-06-13 12:27:031153browse

curl_init回来的编码怎么判断?个别文字乱码求解!
代码如下:

<br /><?php<br />$url = "http://zhidao.baidu.com/link?url=pTwcJotQ02pjg-mjCnc-fkw8ONOY9x8q0ESrCFhdVJy47agZnDnCb-BCAtngRGDt9yi0TvleSS_w0aPj8Vsk0atVkVhNYdZADN0kv0BzNau";<br /><br />echo fopen_url($url);<br />function fopen_url($url) <br />{ <br />    if (function_exists('curl_init'))<br />	{ <br />        $curl_handle = curl_init(); <br />        curl_setopt($curl_handle, CURLOPT_URL, $url); <br />        curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT,2); <br />        curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER,1); <br />        curl_setopt($curl_handle, CURLOPT_FAILONERROR,1); <br />		curl_setopt($curl_handle, CURLOPT_TIMEOUT,2);<br />        $file_content = curl_exec($curl_handle);<br />		$encode = mb_detect_encoding($file_content, array("ASCII","UTF-8","GB2312","GBK","BIG5"));		<br />		if($encode != "UTF-8")<br />		{<br />		   $file_content = mb_convert_encoding($file_content, "UTF-8", $encode);<br />		   //$file_content = iconv($encode,'utf-8//IGNORE',$file_content);<br />		}<br />        curl_close($curl_handle); <br />    }<br />	else<br />	{ <br />        $file_content = ''; <br />    } <br />    return $file_content; <br />}<br />?><br /><br />


个别字符竟然会乱码!奇怪,请看图



这什么原因造成的?

上面的代码貌似有误啊,原页面明明是GB2312  缺判断出是CP936,无语啊

请帮忙看看上面代码是不是需要完善一下

非常感谢!
------解决思路----------------------
返回的数据中有:

根据他就可知道页面编码

没有时才需要编程判断
mb_detect_encoding 判断常有失误,所以又增加了 mb_check_encoding 函数

数据片段

没理由出现非法字符

CP936 是 GBK 的国际称谓
------解决思路----------------------
第一个问题,不是乱码,那是图片,curl抓取百度页面,会特意把某些文字转换成图片,防抓取。你查看网页元素,你就会发现,那些乱码其实是百度的图片地址。

第二个问题,你把超时时间设置大点,就好了,可能是你网络问题。

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn