最近需要写个脚本程序抓取一些网络数据,于是就有了常见的php脚本;测试代码如下: #!/usr/local/bin/php -q?php/** * Created by PhpStorm. * User: jackqqxu * Date: 14-9-12 * Time: 上午12:34 * 解析一个目录下面的文件,分析所有的静态资源然后下载下来
最近需要写个脚本程序抓取一些网络数据,于是就有了常见的php脚本;测试代码如下:
#!/usr/local/bin/php -q <?php /** * Created by PhpStorm. * User: jackqqxu * Date: 14-9-12 * Time: 上午12:34 * 解析一个目录下面的文件,分析所有的静态资源然后下载下来; */ //echo "请输入需要提取的文件路径:\n"; //$path = fread(STDIN, 100); //echo "程序即将读取 $path 路径下面的文件\n"; //echo "请输入需要提取的文件类型:\n"; //$type = fread(STDIN, 100); // Open a known directory, and proceed to read its contents //$path = '/Users/jackqqxu/Desktop/task/game/a_grain_of_truth_files/css/'; $destPath = '/Users/jackqqxu/task/aliyunsvn/health/grain/views/locations/'; //静态文件html $sourcePath = '/Users/jackqqxu/task/aliyunsvn/health/grain/js/'; //静态文件html //$baseUrl = 'http://www.zamolski.com/agot/resources/stylesheets/'; $netSourceUrl = 'http://www.zamolski.com/agot/views/locations/'; //现在获取位置信息 //$type = '.css'; $type = '.js'; //很多需要获取定位的位置信息; $typeLen = strlen($type); //echo 'r=' . realpath('/Users/jackqqxu/Desktop/task/game/a_grain_of_truth_files/css/../images/ui/frame_h.png') . "\n\n"; //echo "the programe will read the $type from the $path\n"; //if (!is_dir($destPath)) { // exec('mkdir -p ' . $destPath); //} if ($dh = opendir($sourcePath)) { while (($file = readdir($dh)) !== false) { $fileType = filetype($sourcePath . $file); if ($fileType != 'file') { continue; } // echo 'f=' . $file . substr($file, strlen($file)-$typeLen) . "\n"; if (substr($file, strlen($file)-$typeLen) == $type) { //类型相同 // echo "filename: $file : filetype: " . filetype($path . $file) . "\n"; echo '$sourcePath . $file=' . $sourcePath . $file . "\n"; $fileContentArr = file($sourcePath . $file); foreach($fileContentArr as $fileLine) { // if ($fileLine =~ /url\((.*?)\)/){ // if (preg_match_all("/url\((.*?)\)/", $fileLine, $matches)) { //css中通过url获取其他图片; if (preg_match_all("/gotoLocation\(\"(.*?)\"\)/", $fileLine, $matches)) { //中通过关键词获取其他文件; // print_r($matches);exit; // foreach($matches[1] as $matchImgUrl) { foreach($matches[1] as $matchUrl) { $sourceUrl = $netSourceUrl . $matchUrl . '.html'; echo 'n='.$sourceUrl."\n";//exit; $descFile = $destPath . $matchUrl . '.html'; // echo 'fs=' . function_exists('realpath'); // echo 'ni=' . $newImgFile."\n";//exit; // echo 'mkdir -p=' . dirname($newImgFile); // exec('mkdir -p ' . dirname($newImgFile)); $ret = file_put_contents($descFile, file_get_contents($sourceUrl)); if ($ret) { echo "文件$descFile 写入成功\n"; // exit; } // exit; } } } } } closedir($dh); } ?>

Del.icio.us![]() |
Facebook![]() |
TweetThis![]() |
Digg![]() |
StumbleUpon![]() |
Comments: 0 (Zero), Be the first to leave a reply!
You might be interested in this:
-
Ubuntu 安装JRE7的快捷方法(验证有效)
-
BigPipe的技术实现【转】
-
'insertCell' called on an object that does not implement interface HTMLTableRowElement.
-
javascript性能优化-repaint和reflow
-
Fiddler工作原理
Copyright © web代码网 [网络爬虫脚本], All Right Reserved. 2014.
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Assassin's Creed Shadows: Seashell Riddle Solution
2 weeks agoByDDD
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Dreamweaver Mac version
Visual web development tools

Atom editor mac version download
The most popular open source editor