search
HomeBackend DevelopmentPHP TutorialUse of single-page parallel collection function get_htmls based on curl data collection_PHP tutorial

Use get_html() in the first article to implement simple data collection. Since the data is collected one by one, the transmission time will be the total download time of all pages. If one page is 1 second, then 10 pages will be 10 Seconds. Fortunately, curl also provides parallel processing capabilities.

To write a function for parallel collection, you must first understand what kind of pages you want to collect and what requests to use for the collected pages. Only then can you write a relatively commonly used function.


Functional requirements analysis:

Return what?

Of course the html of each page is collected into an array

What parameters are passed?

When writing get_html(), we learned that we can use the options array to pass more curl parameters, so the feature of writing simultaneous collection functions for multiple pages must be retained.

What type of parameters?

Whether it is requesting the HTML of a web page or calling the Internet API interface, the parameters passed by get and post always request the same page or interface, but the parameters are different. Then the parameter type is:

get_htmls($url,$options);

$url is string

$options is a two-dimensional array, and the parameters of each page are an array.

In this case, the problem seems to be solved. But I searched all over the curl manual and couldn't see where the get parameters are passed, so I can only pass $url in the form of an array and add a method parameter


The prototype of the function is decided on get_htmls($urls,$options = array, $method = 'get'); the code is as follows:

Copy code The code is as follows:

function get_htmls($urls, $options = array(), $method = 'get'){
$mh = curl_multi_init();
if($method == 'get'){//The get method is most commonly used to pass values
foreach($urls as $key=>$url){
$ch = curl_init($url);
$options[CURLOPT_RETURNTRANSFER] = true;
$options[CURLOPT_TIMEOUT] = 5;
curl_setopt_array($ch,$options);
$cur ls[$key] = $ch;
       curl_multi_add_handle( $mh,$curls[$key]);
                                                                                                                       option){
                                                                         $option[CURLOPT_POST] = true;
             curl_setopt_array($ch,$option); }else{
exit("Parameter error! n");
}
do{
$mrc ​​= curl_multi_exec($mh,$active);
curl_multi_select($mh);//Reduce CPU pressure Comment out the CPU pressure to increase
}while($active);
foreach($curls as $key=>$ch){
$html = curl_multi_getcontent($ch);
curl_multi_remove_handle( $mh,$ch);
curl_close($ch);
$htmls[$key] = $html;
}
curl_multi_close($mh);
return $htmls;
}


Commonly used get requests are implemented by changing url parameters, and because our function is aimed at data collection. It must be collected by category, so the URL is similar to this:

http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8

http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8

The above five pages are very regular, and only the value of pn changes.

Copy code The code is as follows:

$urls = array();
for($i= 1; $i $urls[] = 'http://www.baidu.com/s?wd=shili&pn='.(($i-1)*10). '&ie=utf-8';
}
$option[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 6.1; rv:19.0) Gecko/20100101 Firefox/19.0';
$htmls = get_htmls( $urls,$option);
foreach($htmls as $html){
echo $html;//Get html here and you can perform data processing
}

Simulate common post requests:

Write a post.php file as follows:

Copy the code The code is as follows:

if(isset($_POST[ 'username']) && isset($_POST['password'])){
echo 'The username is: '.$_POST['username'].' The password is: '.$_POST['password'] ;
}else{
echo 'Request error!';
}

Then call as follows:
Copy code The code is as follows:

$url = 'http://localhost/yourpath/post.php';//Here is your path
$options = array();
for($i=1; $i $option[CURLOPT_POSTFIELDS] = 'username=user'.$i.'&password=pass'.$i;
$options[] = $option;
}
$htmls = get_htmls($url,$options,'post');
foreach($htmls as $html){
echo $html; //Get the html here and you can perform data processing
}

In this way, the get_htmls function can basically implement some data collection functions

That’s it for today’s sharing. If it’s not well written or unclear, please give me some advice

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326892.htmlTechArticleUse the get_html() in the first article to implement simple data collection, because the data is collected and transmitted one by one. The time will be the total download time of all pages, assuming 1 second for a page, then...
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
python中CURL和python requests的相互转换如何实现python中CURL和python requests的相互转换如何实现May 03, 2023 pm 12:49 PM

curl和Pythonrequests都是发送HTTP请求的强大工具。虽然curl是一种命令行工具,可让您直接从终端发送请求,但Python的请求库提供了一种更具编程性的方式来从Python代码中发送请求。将curl转换为Pythonrequestscurl命令的基本语法如下所示:curl[OPTIONS]URL将curl命令转换为Python请求时,我们需要将选项和URL转换为Python代码。这是一个示例curlPOST命令:curl-XPOSThttps://example.com/api

Linux下更新curl版本教程!Linux下更新curl版本教程!Mar 07, 2024 am 08:30 AM

在Linux下更新curl版本,您可以按照以下步骤进行操作:检查当前curl版本:首先,您需要确定当前系统中安装的curl版本。打开终端,并执行以下命令:curl--version该命令将显示当前curl的版本信息。确认可用的curl版本:在更新curl之前,您需要确定可用的最新版本。您可以访问curl的官方网站(curl.haxx.se)或相关的软件源,查找最新版本的curl。下载curl源代码:使用curl或浏览器,下载您选择的curl版本的源代码文件(通常为.tar.gz或.tar.bz2

PHP8.1发布:引入curl多个请求并发处理PHP8.1发布:引入curl多个请求并发处理Jul 08, 2023 pm 09:13 PM

PHP8.1发布:引入curl多个请求并发处理近日,PHP官方发布了最新版本的PHP8.1,其中引入了一个重要的特性:curl多个请求并发处理。这个新特性为开发者提供了一个更加高效和灵活的方式来处理多个HTTP请求,极大地提升了性能和用户体验。在以往的版本中,处理多个请求往往需要通过创建多个curl资源,并使用循环来分别发送和接收数据。这种方式虽然能够实现目

从头到尾:如何使用php扩展cURL进行HTTP请求从头到尾:如何使用php扩展cURL进行HTTP请求Jul 29, 2023 pm 05:07 PM

从头到尾:如何使用php扩展cURL进行HTTP请求引言:在Web开发中,经常需要与第三方API或其他远程服务器进行通信。而使用cURL进行HTTP请求是一种常见而强大的方式。本文将介绍如何使用php扩展cURL来执行HTTP请求,并提供一些实用的代码示例。一、准备工作首先,确保php已安装cURL扩展。可以在命令行执行php-m|grepcurl查

linux curl是什么linux curl是什么Apr 20, 2023 pm 05:05 PM

在linux中,​curl是一个非常实用的、用来与服务器之间传输数据的工具,是一个利用URL规则在命令行下工作的文件传输工具;它支持文件的上传和下载,是综合传输工具。curl提供了一大堆非常有用的功能,包括代理访问、用户认证、ftp上传下载、HTTP POST、SSL连接、cookie支持、断点续传等等。

PHP Curl中如何处理网页的 301 重定向?PHP Curl中如何处理网页的 301 重定向?Mar 08, 2024 am 11:36 AM

PHPCurl中如何处理网页的301重定向?在使用PHPCurl发送网络请求时,时常会遇到网页返回的301状态码,表示页面被永久重定向。为了正确处理这种情况,我们需要在Curl请求中添加一些特定的选项和处理逻辑。下面将详细介绍在PHPCurl中如何处理网页的301重定向,并提供具体的代码示例。301重定向处理原理301重定向是指服务器返回了一个30

php curl怎么设置cookiephp curl怎么设置cookieSep 26, 2021 am 09:27 AM

php curl设置cookie的方法:1、创建PHP示例文件;2、通过“curl_setopt”函数设置cURL传输选项;3、在CURL中传递cookie即可。

PHP Fatal error: Call to undefined function curl_setopt()的解决方法PHP Fatal error: Call to undefined function curl_setopt()的解决方法Jun 23, 2023 am 08:18 AM

PHP是一种广泛使用的开源脚本语言,被许多网站所使用。然而,有时候你可能会遇到PHPFatalerror:Calltoundefinedfunctioncurl_setopt()这个问题,这个问题也许会使你的网站无法正常工作。那么这个问题到底是什么原因造成的呢?在PHP中,curl_setopt()是一个非常重要的函数,它用于通过curl扩展库

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version