Home  >  Article  >  Backend Development  >  PHP hyperlink crawling implementation code_PHP tutorial

PHP hyperlink crawling implementation code_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:45:13977browse

Regular expression test for obtaining common HTML standard hyperlink parameters
Because I recently want to build something similar to a professional search engine, I need to crawl all the hyperlinks of the web page.
Please help me test whether the following code can target all standard hyperlinks.
The test code is as follows:

Copy code The code is as follows:

// -- -------------------------------------------------- -----------------------
// File name: Noname1.php
// Description: Universal link parameter acquisition regular expression test
// Requirement: PHP4 (http://www.php.net)
// Copyright(C), HonestQiao, 2005, All Rights Reserved.
// Author: HonestQiao (honestqiao@hotmail.com)
// Parameter description:
// $strSource: HTML webpage containing standard links
// $strResult: Processing results
// Additional instructions:
// Standard links, use Links included
// -------------------------------- ------------------------------------------
$strSource = < <t1
t2
t3
t4
HTML;
preg_match_all('/( .+?)/sim', $strSource, $strResult, PREG_PATTERN_ORDER);
for($i = 0; $i < count($strResult[1]); $ i++)
{
printf("%d href=(%s) title=(%s) n", $i, $strResult[1][$i], $strResult[2][$i ]);
}
?>

If your test data conforms to the standard link, but is not processed here, please tell me the test data and your test environment.
Thank you.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/320346.htmlTechArticleGeneral HTML standard hyperlink parameters are obtained for regular expression testing. Because I need to do something similar to a professional search engine recently, I need Crawl all hyperlinks of a web page. Please help to test it...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn