网页
没事的时候想抓取点网页看看,也不太懂,只是玩玩,写了点代码,聊以娱乐。
稍微有点意义的函数是:get_content_by_socket(), get_url(), get_content_url(), get_content_object 几个函数,也许能够给你点什么想法。
//获取所有内容url保存到文件
function get_index($save_file, $prefix="index_"){
$count = 68;
$i = 1;
if (file_exists($save_file)) @unlink($save_file);
$fp = fopen($save_file, "a+") or die("Open ". $save_file ." failed");
while($i $url = $prefix . $i .".htm";
echo "Get ". $url ."...";
$url_str = get_content_url(get_url($url));
echo " OK\n";
fwrite($fp, $url_str);
++$i;
}
fclose($fp);
}
//获取目标多媒体对象
function get_object($url_file, $save_file, $split="|--:**:--|"){
if (!file_exists($url_file)) die($url_file ." not exist");
$file_arr = file($url_file);
if (!is_array($file_arr) || empty($file_arr)) die($url_file ." not content");
$url_arr = array_unique($file_arr);
if (file_exists($save_file)) @unlink($save_file);
$fp = fopen($save_file, "a+") or die("Open save file ". $save_file ." failed");
foreach($url_arr as $url){
if (empty($url)) continue;
echo "Get ". $url ."...";
$html_str = get_url($url);
echo $html_str;
echo $url;
exit;
$obj_str = get_content_object($html_str);
echo " OK\n";
fwrite($fp, $obj_str);
}
fclose($fp);
}
//遍历目录获取文件内容
function get_dir($save_file, $dir){
$dp = opendir($dir);
if (file_exists($save_file)) @unlink($save_file);
$fp = fopen($save_file, "a+") or die("Open save file ". $save_file ." failed");
while(($file = readdir($dp)) != false){
if ($file!="." && $file!=".."){
echo "Read file ". $file ."...";
$file_content = file_get_contents($dir . $file);
$obj_str = get_content_object($file_content);
echo " OK\n";
fwrite($fp, $obj_str);
}
}
fclose($fp);
}
//获取指定url内容
function get_url($url){
$reg = '/^http:\/\/[^\/].+$/';
if (!preg_match($reg, $url)) die($url ." invalid");
$fp = fopen($url, "r") or die("Open url: ". $url ." failed.");
while($fc = fread($fp, 8192)){
$content .= $fc;
}
fclose($fp);
if (empty($content)){
die("Get url: ". $url ." content failed.");
}
return $content;
}
//使用socket获取指定网页
function get_content_by_socket($url, $host){
$fp = fsockopen($host, 80) or die("Open ". $url ." failed");
$header = "GET /".$url ." HTTP/1.1\r\n";
$header .= "Accept: */*\r\n";
$header .= "Accept-Language: zh-cn\r\n";
$header .= "Accept-Encoding: gzip, deflate\r\n";
$header .= "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Maxthon; InfoPath.1; .NET CLR 2.0.50727)\r\n";
$header .= "Host: ". $host ."\r\n";
$header .= "Connection: Keep-Alive\r\n";
//$header .= "Cookie: cnzz02=2; rtime=1; ltime=1148456424859; cnzz_eid=56601755-\r\n\r\n";
$header .= "Connection: Close\r\n\r\n";
fwrite($fp, $header);
while (!feof($fp)) {
$contents .= fgets($fp, 8192);
}
fclose($fp);
return $contents;
}
//获取指定内容里的url
function get_content_url($host_url, $file_contents){
//$reg = '/^(#|javascript.*?|ftp:\/\/.+|http:\/\/.+|.*?href.*?|play.*?|index.*?|.*?asp)+$/i';
//$reg = '/^(down.*?\.html|\d+_\d+\.htm.*?)$/i';
$rex = "/([hH][rR][eE][Ff])\s*=\s*['\"]*([^>'\"\s]+)[\"'>]*\s*/i";
$reg = '/^(down.*?\.html)$/i';
preg_match_all ($rex, $file_contents, $r);
$result = ""; //array();
foreach($r as $c){
if (is_array($c)){
foreach($c as $d){
if (preg_match($reg, $d)){ $result .= $host_url . $d."\n"; }
}
}
}
return $result;
}
//获取指定内容中的多媒体文件
function get_content_object($str, $split="|--:**:--|"){
$regx = "/href\s*=\s*['\"]*([^>'\"\s]+)[\"'>]*\s*(.*?)/i";
preg_match_all($regx, $str, $result);
if (count($result) == 3){
$result[2] = str_replace("多媒体: ", "", $result[2]);
$result[2] = str_replace("", "", $result[2]);
$result = $result[1][0] . $split .$result[2][0] . "\n";
}
return $result;
}
?>

Alat AI Hot

Undresser.AI Undress
Apl berkuasa AI untuk mencipta foto bogel yang realistik

AI Clothes Remover
Alat AI dalam talian untuk mengeluarkan pakaian daripada foto.

Undress AI Tool
Gambar buka pakaian secara percuma

Clothoff.io
Penyingkiran pakaian AI

AI Hentai Generator
Menjana ai hentai secara percuma.

Artikel Panas

Alat panas

MinGW - GNU Minimalis untuk Windows
Projek ini dalam proses untuk dipindahkan ke osdn.net/projects/mingw, anda boleh terus mengikuti kami di sana. MinGW: Port Windows asli bagi GNU Compiler Collection (GCC), perpustakaan import yang boleh diedarkan secara bebas dan fail pengepala untuk membina aplikasi Windows asli termasuk sambungan kepada masa jalan MSVC untuk menyokong fungsi C99. Semua perisian MinGW boleh dijalankan pada platform Windows 64-bit.

EditPlus versi Cina retak
Saiz kecil, penyerlahan sintaks, tidak menyokong fungsi gesaan kod

SublimeText3 versi Cina
Versi Cina, sangat mudah digunakan

SublimeText3 Linux versi baharu
SublimeText3 Linux versi terkini

Hantar Studio 13.0.1
Persekitaran pembangunan bersepadu PHP yang berkuasa