Home > Article > Backend Development > Detailed introduction to how PHP+JavaScript crawls web content
The content of this article is to introduce in detail how PHP JavaScript crawls web content. It has a certain reference value. Friends in need can refer to it
We have always thought that only Python can crawl web content. That is because Python itself has many class libraries that are very convenient for crawling web pages, but our method of using PHP js is also very convenient and we can get it. The web content you want, and it doesn’t have to be complicated.
// 允许所有域访问 header("Access-Control-Allow-Origin: *"); / 接收一个参数,参数名叫parm $parm=$_GET['mod']; if (empty($parm)) { $url = 'http://m.80s.tw/';//Detailed introduction to how PHP+JavaScript crawls web content $html = file_get_contents($url); }else{ $url = 'http://m.80s.tw/'.$parm; $html = file_get_contents($url); } preg_match("/<body[^>]*?>(.*\s*?)<\/body>/is",$html,$match1);//正则匹配body里面的内容 echo $match1[0];//输出网页
Note: If you encounter a file_get_contents error, please try to find extension=php_openssl.dll in php.ini and turn it on OK
First write an asynchronous request
$.ajax({ type:'get', url: '.././admin/test.php', success: function(data) { console.log(data)//可以看到获取的HTML,很简单吧,很兴奋吧 } });
How to use these HTML? Is this a problem? No
//首先创建一个容器 var p = document.createElement('p'); // 把整个html的字符串存到这个p节点里 p.innerHTML = data; //然后就可以对p一顿检查了 //比如获取类list_mov_title下所有的a标签 var list = p.querySelectorAll('.list_mov_title a'); //赶紧打印出来看一下 console.log(list) //想要的东西都在吧 //然后就把想要的东西往自己的页面里面塞吧
This is the end of a tutorial on crawling web content. If you feel enlightened, please forward it. If you don’t understand, please leave a message
The above is the detailed content of Detailed introduction to how PHP+JavaScript crawls web content. For more information, please follow other related articles on the PHP Chinese website!