Home >Backend Development >PHP Tutorial >curl - php 如何实现定时爬取一个网页的新闻的时间

curl - php 如何实现定时爬取一个网页的新闻的时间

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-06-06 20:37:181296browse

现在项目要求是每天凌晨去爬取一个网页的内容求大神支招最好有代码简述谢谢了

回复内容：

现在项目要求是每天凌晨去爬取一个网页的内容求大神支招最好有代码简述谢谢了

1.定时用Linux工具crontab
2.爬取用php的file_get_contents函数足矣，不行就用php_curl扩展
3.内容用正则表达式匹配

新闻一般都写有时间的吧,你比如这个 http://news.163.com/15/0313/03/AKIB93GC00014AED.html,他里面就写有时间戳: 2015-03-13 03:20:29
如果没有的化,新闻网站一般都是静态页面,你可以参考他的http header,比如

curl 'http://news.163.com/15/0313/03/AKIB93GC00014AED.html' --head
HTTP/1.1 200 OK
Server: FSCS/1.2.5
Date: Fri, 13 Mar 2015 01:23:25 GMT
Content-Type: text/html; charset=GBK
Content-Length: 162187
Connection: keep-alive
Last-Modified: Fri, 13 Mar 2015 01:18:25 GMT
Vary: Accept-Encoding
ETag: "55023ae1-2798b"
......

这里面, Last-Modified就是可以近似当做他的时间啦.

写好抓取脚本，用Linux crontab定时去执行。

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：PHP 验证码显示破图是怎么回事？Next article：这种并发锁的原理是啥

See more

curl - php 如何实现定时爬取 一个网页的新闻的时间

回复内容：

Related articles

curl - php 如何实现定时爬取一个网页的新闻的时间