Home  >  Article  >  Backend Development  >  PHP's cURL library makes web crawling simple and effective_PHP Tutorial

PHP's cURL library makes web crawling simple and effective_PHP Tutorial

WBOY
WBOYOriginal
2016-07-15 13:22:47783browse

Use PHP’s cURL library to crawl web pages simply and effectively. You only need to run a script and analyze the web pages you crawled, and then you can get the data you want programmatically. Whether you want to retrieve partial data from a link, take an XML file and import it into a database, or even simply retrieve the content of a web page, cURL is a powerful PHP library. This article mainly describes how to use this PHP library.

Enable cURL settings

First of all, we must first determine whether our PHP has this library enabled. You can get this information by using the php_info() function.

phpinfo();
?>

If you can see the following output on the web page, it means that the cURL library has been enabled.

If you see this, then you need to set up your PHP and enable this library. If you are on the Windows platform, it is very simple. You need to change the settings of your php.ini file, find php_curl.dll, and cancel the previous semicolon comment. As shown below:

//Uncomment the following
extension=php_curl.dll

If you are under Linux, then you need to recompile your PHP, edit , you need to turn on the compilation parameters - add the "-with-curl" parameter to the configure command.

A small example

If everything is ready, here is a small routine:

//Initialize a cURL object
$curl = curl_init();

// Set the URL you need to crawl
curl_setopt($curl, CURLOPT_URL, 'http://cocre.com');

// Set header
curl_setopt($curl, CURLOPT_HEADER, 1);

// Set the cURL parameters to require the result to be saved in a string or output to the screen.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

// Run cURL and request the web page
$data = curl_exec($curl);

// Close URL request
curl_close($curl);

// Display the obtained data
var_dump($data);
?>

How to POST data

The above is the code to crawl the web page, and the following is to POST data to a web page. Suppose we have a form processing URL http://www.example.com/sendSMS.php, which can accept two form fields, one is a phone number and the other is text message content.

$phoneNumber = '13912345678';
$message = 'This message was generated by curl and php';
$curlPost = 'pNUMBER=' . urlencode( $phoneNumber) . '&MESSAGE=' . urlencode($message) . '&SUBMIT=Send';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example. com/sendSMS.php');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
$data = curl_exec();curl_close($ch);
?>

From the above program we can see that using CURLOPT_POST Set the POST method of the HTTP protocol instead of the GET method, and then set the POST data with CURLOPT_POSTFIELDS.

About proxy server

Here is an example of how to use a proxy server. Please pay attention to the highlighted code. The code is very simple, so I don’t need to say more.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com');
curl_setopt($ch , CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, 'fakeproxy.com:1080') ;
curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');
$data = curl_exec();curl_close($ch);
?>

About SSL and Cookies

Regarding SSL, which is the HTTPS protocol, you only need to change http:// in the CURLOPT_URL connection to https://. Of course, there is also a parameter called CURLOPT_SSL_VERIFYHOST that can be set to verify the site.

About cookies, you need to know the following three parameters:

CURLOPT_COOKIE, set a cookie in the face-to-face session

CURLOPT_COOKIEJAR, save a cookie when the session ends

CURLOPT_COOKIEFILE, cookie file.

HTTP server authentication

Finally, let’s take a look at HTTP server authentication.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com');
curl_setopt($ch , CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt(CURLOPT_USERPWD, '[username]:[password]')

$data = curl_exec();
curl_close($ch);
?>

For more information, please refer to the relevant cURL manual.


www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/446950.htmlTechArticleUsing PHP’s cURL library can easily and effectively capture web pages. You only need to run a script and analyze the web pages you crawled, and then you can get what you want programmatically...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn