Home >Backend Development >PHP Tutorial >Get all links by reading the source files of a site

Get all links by reading the source files of a site

WBOY
WBOYOriginal
2016-07-25 09:11:10846browse
Read the source file of a certain site, then use regular expressions to analyze its source code and get all the links.
  1. /**********qiushuiwuhen(2002-5-20)***********/
  2. if(empty($url))$url = "http://www.csdn.net/expert/";//Set url
  3. $site =substr($url,0,strpos($url,"/",8));//Site
  4. $base=substr($url,0,strrpos($url,"/") 1);//File Directory
  5. $fp = fopen($url, "r" );//Open url
  6. while(!feof($fp))$contents.=fread($fp,1024);//
  7. $pattern="|href=['"]?([^ '"] )['" ]|U";
  8. preg_match_all($pattern,$contents, $regArr, PREG_SET_ORDER);//Match all href=
  9. for ($i=0;$iif(!eregi("://",$regArr[$i][1]))//Whether It is a relative path, that is, whether there is ://
  10. if(substr($regArr[$i][1],0,1)=="/")//whether it is the root directory of the site
  11. echo "link". ($i 1).":".$site.$regArr[$i][1]."
    ";//Root directory
  12. else
  13. echo "link".($i 1). ":".$base.$regArr[$i][1]."
    ";//Current directory
  14. else
  15. echo "link".($i 1).":".$regArr [$i][1]."
    ";//Relative path
  16. }
  17. fclose($fp);
  18. ?>
Copy code


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn