Home  >  Article  >  Backend Development  >  How does php use the same domain name to crawl remote web content from multiple IPs?

How does php use the same domain name to crawl remote web content from multiple IPs?

伊谢尔伦
伊谢尔伦Original
2017-07-17 10:30:191416browse

When the same domain name corresponds to multiple IPs, PHP's function to obtain the content of the remote web page
fgc simply reads it and encapsulates all operations
fopen also performs some encapsulation, but it requires you to read it in a loop Get all the data.
fsockopen This is a straight-line socket operation.
If you just read an html page, fgc is better.
If the company accesses the Internet through a firewall, the general file_get_content function will not work. Of course, it is also possible to directly write http requests to the proxy through some socket operations, but it is more troublesome.
If you can confirm that the file is small, you can choose any of the above two methods fopen,join('',file($file));. For example, if you only operate files smaller than 1k, it is best to use file_get_contents.
If you are sure that the file is large, or the size of the file cannot be determined, it is best to use a file stream. There is no obvious difference between fopening a 1K file and fopening a 1G file. The longer the content, the longer it takes to read, rather than letting the script die.

PHP has many ways to obtain remote web content, such as using its own functions such as file_get_contents and fopen.

<?php 
echo file_get_contents("http://php.cn/abc.php");
?>

However, in load balancing such as DNS polling, the same domain name may correspond to multiple servers and multiple IPs. Assume that http://php.cn/abc.php
is resolved by DNS to three IPs: 72.249.146.213, 72.249.146.214, and 72.249.146.215. Every time the user visits http://php.cn/abc.php, The system will access one of the servers according to the corresponding load balancing algorithm.
When I was working on a video project last week, I encountered such a requirement: I needed to access a PHP interface program (assumed to be abc.php) on each server in turn to query the transmission status of this server. ##. At this time, you cannot directly use file_get_contents to access http://php.cn/abc.php, because it may repeatedly access a certain server.
By visiting http://72.249.146.213/abc.php, http://72.249.146.214/abc.php, http://72.249.146.215/abc.php in sequence, in these three servers This is also not possible when the Web Server on the computer is equipped with multiple
virtual hosts. It is not possible to set local hosts, because hosts cannot set multiple IPs corresponding to the same domain name.
This can only be achieved through PHP and HTTP protocols: when accessing abc.php, add the php.cn domain name to the header. So, I wrote the following
PHP function:

<?php
 /************************
 * 函数用途:同一域名对应多个IP时,获取指定服务器的远程网页内容
 * 参数说明:
 * $ip服务器的IP地址
 * $host服务器的host名称
 * $url服务器的URL地址(不含域名)
 * 返回值:
 * 获取到的远程网页内容
 * false访问远程网页失败
 ************************/
function HttpVisit($ip, $host, $url) 
{ 
$errstr = &#39;&#39;; 
$errno = &#39;&#39;; 
$fp = fsockopen ($ip, 80, $errno, $errstr, 90); 
if (!$fp) 
{ 
 return false; 
} 
else
{ 
$out = "GET {$url} HTTP/1.1\r\n"; 
$out .= "Host:{$host}\r\n"; 
$out .= "Connection: close\r\n\r\n"; 
fputs ($fp, $out);

while($line = fread($fp, 4096)){ 
$response .= $line; 
} 
fclose( $fp );
//去掉Header头信息
$pos = strpos($response, "\r\n\r\n"); 
$response = substr($response, $pos + 4);
return $response; 
} 
 }
 //调用方法:
 $server_info1 = HttpVisit("72.249.146.213", "php.cn", "/abc.php"); 
 $server_info2 = HttpVisit("72.249.146.214", "php.cn", "/abc.php"); 
 $server_info3 = HttpVisit("72.249.146.215", "php.cn", "/abc.php"); 
 ?>

Use the fsockopen function to open the url and obtain the complete data in POST mode, including header and body

<?
functionHTTP_Post($URL,$data,$cookie,$referrer=""){
// parsing the given URL
$URL_Info=parse_url($URL);
// Building referrer
if($referrer=="")// if not given use this script. as referrer
$referrer="111";
// making string from $data
foreach($dataas$key=>$value)
$values[]="$key=".urlencode($value);
$data_string=implode("&",$values);
// Find out which port is needed - if not given use standard (=80)
if(!isset($URL_Info["port"]))
$URL_Info["port"]=80;
// building POST-request:
$request.="POST ".$URL_Info["path"]." HTTP/1.1\n";
$request.="Host: ".$URL_Info["host"]."\n";
$request.="Referer:$referer\n";
$request.="Content-type: application/x-www-form-urlencoded\n";
$request.="Content-length: ".strlen($data_string)."\n";
$request.="Connection: close\n";
$request.="Cookie:$cookie\n";
$request.="\n";
$request.=$data_string."\n";
$fp=fsockopen($URL_Info["host"],$URL_Info["port"]);
fputs($fp,$request);
while(!feof($fp)){
$result.=fgets($fp,1024);
}
fclose($fp);
return$result;
}
printhr();
?>

The above is the detailed content of How does php use the same domain name to crawl remote web content from multiple IPs?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn