Home >php教程 >php手册 >php获取网页标题和内容函数(不包含html标签)

php获取网页标题和内容函数(不包含html标签)

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-06-06 20:25:181112browse

有时候我们需要获取网页的标题与内容，就是个采集函数，这里简单分享下，方便需要的朋友

复制代码代码如下:

function getPageContent($url) {

        //$url='http://www.ttphp.com;

        $pageinfo = array();
        $pageinfo[content_type] = '';
        $pageinfo[charset] = '';
        $pageinfo[title] = '';
        $pageinfo[description] = '';
        $pageinfo[keywords] = '';
        $pageinfo[body] = '';
        $pageinfo['httpcode'] = 200;
        $pageinfo['all'] = '';

        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,0);
        curl_setopt($ch, CURLOPT_TIMEOUT, 8);
        curl_setopt($ch, CURLOPT_FILETIME, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        //curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt($ch, CURLOPT_URL,$url);

        $curl_start = microtime(true);
        $store = curl_exec ($ch);

        $curl_time = microtime(true) - $curl_start;
        if( curl_error($ch) ) {
            $pageinfo['httpcode'] = 505; //gate way error
            echo 'Curl error: ' . curl_error($ch) ."/n";
            return $pageinfo;
        }

        //print_r(curl_getinfo($ch));
        $pageinfo['httpcode'] = curl_getinfo($ch,CURLINFO_HTTP_CODE);
        //echo curl_getinfo($ch,CURLINFO_CONTENT_TYPE)."/n";
        $pageinfo[content_type] = curl_getinfo($ch,CURLINFO_CONTENT_TYPE);
        if(intval($pageinfo['httpcode']) 200 or !preg_match('@text/html@',curl_getinfo($ch,CURLINFO_CONTENT_TYPE) )   ) {
                //print_r(curl_getinfo($ch) );
                //exit;
                return $pageinfo;
        }
        preg_match('/charset=([^/s/n/r]+)/i',curl_getinfo($ch,CURLINFO_CONTENT_TYPE),$matches); //从header 里取charset
        if( trim($matches[1]) ) {
            $pageinfo[charset] = trim($matches[1]);
        }
        //echo $pageinfo[charset];
        //exit;
        curl_close ($ch);
        //echo $store;

        //remove javascript
        $store = preg_replace("/
        $store = preg_replace("//smUi",'',$store);
        //remove

使用例子

复制代码代码如下:

$a = getPageContent();
print_r($a);

，

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：PHP弹出提示框并跳转到新页面即重定向到新页面Next article：header导出Excel应用示例

Related articles

See more