Home  >  Article  >  Backend Development  >  经过URL抓取网页的TITLE,有些网站抓不到,方法愚笨,求指点

经过URL抓取网页的TITLE,有些网站抓不到,方法愚笨,求指点

WBOY
WBOYOriginal
2016-06-13 12:34:06831browse

通过URL抓取网页的TITLE,有些网站抓不到,方法愚笨,求指点。

本帖最后由 u012716911 于 2013-11-04 11:25:29 编辑 代码是我自己这样想着写的,不知道还有没有更好的方法。请各位给些指点
有些网站可以抓到,如百度,有些网站就抓不到,比如太平洋汽车的首页。

<br />
public function set_title()<br />
	{<br />
		// 获取进来URL<br />
		$url = $_POST['url'];<br />
		// $url = "www.pcauto.com.cn"; 抓不到!<br />
		//一连串的curl设置		<br />
		$ch = curl_init();<br />
		curl_setopt($ch,CURLOPT_URL,$url);<br />
		curl_setopt($ch,CURLOPT_HEADER,0);<br />
		curl_setopt($ch,CURLOPT_ENCODING,'gzip');<br />
		curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);<br />
		$content_source = curl_exec($ch);<br />
		curl_close($ch);<br />
	<br />
		//获取抓到内容的编码格式<br />
<br />
		$encode = mb_detect_encoding($content_source, array('GB2312','GBK','UTF-8','ASCII')); <br />
		<br />
		//转码<br />
		$content_source = iconv($encode, 'utf-8//IGNORE',$content_source);<br />
		<br />
		//截取<title><br />
		if(preg_match("/<title>(.*?)<\/title>/i",$content_source,$title))<br />
		{<br />
			echo $title[1];<br />
		}<br />
		else<br />
		{<br />
			echo '拉取标题失败';<br />
		}<br />
	}<br />
curl 抓取 标题
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn