Home  >  Article  >  Backend Development  >  php+R language analysis website response time

php+R language analysis website response time

巴扎黑
巴扎黑Original
2016-11-12 15:28:231668browse

Make a web crawler to capture specific content on the web page. A senior graduate student wrote one before, but the teacher thought it took too long and only had thousands of data. It took one night. This time I was asked to do it. I wanted to conduct a feasibility study first, and I needed to use the R language. Make statistics.

There are two difficulties in this experiment, or in fact there is only one, and that is the standardized representation of the data. I have never used php to read and write files before, this is the first time. What needs to be considered is the frequency of file reading and writing. Although it is just an experiment, efficiency still needs to be considered. Too frequent file reading and writing and too time-consuming operations on the disk are a big problem. So this needs to be considered. In fact, it is a question of the format of the data, in what format it is stored. We need to consider the subsequent processing of R language. R language can process plain text, and delimiters can be used between data, such as commas or even tabs. Therefore, the data in the file is intended to be separated by commas.

First paste the PHP code
include ("php_lib/LIB_http.php");
error_reporting(E_ALL^E_NOTICE);
$target ="http://www.*****";
$ref = "http://www.*****";
$filename = 'sitevisitors.txt';


$first=microtime(get_as_float);
for($n=0;$n< 5000;$n++){
  $betime=microtime(get_as_float);
  $return_arry = http_get_withheader($target,$ref);
  $finidown = microtime(get_as_float);
  $resulttime = $finidown - $betime;
  $count [$n] = $resulttime;
//echo $count[$n]."n";
echo"n".$n;
}
$fp = fopen("data.txt", "a") ;
//fputs ($fp, "$count[0]");
for($n=0;$n<5000;$n++){
fputs($fp, "rn".$count[$n ]);
}
$last=microtime(get_as_float);
$result=$last-$first;
fclose ($fp);
echo"nend this test";
echo"n the time is:".$ result;
?>

Since it is inconvenient for this website to publish, the connection address and host address are replaced by *, please understand. The program will first design an array of 5000 elements, then send 5000 http requests and record the time each time. There seems to be this time in the http message, but I can't remember it clearly, so I use the microtime() function. Note that you need to add get_as_float to do the subtraction, and add include ("php_lib/LIB_http.php"); Block all php notice.

All data are written into the data.txt file. It should be noted that the file data format should be a matrix. Even if there is only one data source, that is, only one column, each data must have its own row. It cannot be written consecutively, for example, it cannot be 1,2,3,4..., but should be:
1
2
3
4
...

The reason why this is done is because of the R language, which is a matrix Reading and writing, so writing this way is the most convenient (there may be a better way, but I don’t know it).

After getting the time, open the R language environment, and then do statistics:
①Read the data:
data<-read.table("data.txt",header=FALSE,sep=",",col.names=c ('num'))
② Find the average:
mean(data[,1])
Note that it cannot be mean(data), otherwise the following warning will appear:
[1] NA
Warning message:
In mean.default (data): The parameter is neither a numeric value nor a logical value: reply NA
data[,1] represents the first column of the matrix data (actually there is only one column here, but it must be written like this).
③I want to draw a scatter plot, but the coordinate accuracy is too small and cannot be distinguished. I need to continue to study:
c<-data[,1]
mydata<-rbind(c,c)
mydata<-as. data.frame(mydata)
namse(mydata)<-c("x","y")
with(mydata,plot(x,y,pch=19,main="the result"))

Figure It was drawn, but the coordinate accuracy is only 2 digits after the decimal point. We are currently studying how to improve the coordinate accuracy, and options (digits) are no longer available. Just think about it.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn