search
HomePHP LibrariesOther librariesGoutte-masterWeb scraper PHP class
Goutte-masterWeb scraper PHP class
<?phpclass Curl{ 
 public $cookieJar ="";
 public function __construct($cookieJarFile = 'cookies.txt') {
 $this->cookieJar = $cookieJarFile;
 }
 function setup()
 {
 $header = array();
 $header[0] ="Accept: text/xml,application/xml,application/xhtml+xml,";
 $header[0]. ="text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
 $header[] ="Cache-Control: max-age=0";
 $header[] ="Connection: keep-alive";
 $header[] ="Keep-Alive: 300";
 $header[] ="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
 $header[] ="Accept-Language: en-us,en;q=0.5";
 $header[] ="Pragma:";//browsers keep this blank.
 curl_setopt($this->curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7');

First, you get or send your request to the specified URL

Next you will receive the html returned as the response

Finally, you will parse your request from the html The text you want to crawl.


Disclaimer

All resources on this site are contributed by netizens or reprinted by major download sites. Please check the integrity of the software yourself! All resources on this site are for learning reference only. Please do not use them for commercial purposes. Otherwise, you will be responsible for all consequences! If there is any infringement, please contact us to delete it. Contact information: admin@php.cn

Related Article

How to use PHP Goutte class library for web crawling and data extraction?How to use PHP Goutte class library for web crawling and data extraction?

09Aug2023

How to use the PHPGoutte class library for web crawling and data extraction? Overview: In the daily development process, we often need to obtain various data from the Internet, such as movie rankings, weather forecasts, etc. Web crawling is one of the common methods to obtain this data. In PHP development, we can use the Goutte class library to implement web crawling and data extraction functions. This article will introduce how to use the PHPGoutte class library to crawl web pages and extract data, and attach code examples. What is Gout

How Do I Link Static Libraries That Depend on Other Static Libraries?How Do I Link Static Libraries That Depend on Other Static Libraries?

13Dec2024

Linking Static Libraries to Other Static Libraries: A Comprehensive ApproachStatic libraries provide a convenient mechanism to package reusable...

How to import third-party libraries in ThinkPHPHow to import third-party libraries in ThinkPHP

03Jun2023

Third-party class libraries Third-party class libraries refer to other class libraries besides the ThinkPHP framework and application project class libraries. They are generally provided by third-party systems or products, such as class libraries of Smarty, Zend and other systems. For the class libraries imported earlier using automatic loading or the import method, the ThinkPHP convention is to use .class.php as the suffix. Non-such suffixes need to be controlled through the import parameters. But for the third type of library, since there is no such agreement, its suffix can only be considered to be php. In order to easily introduce class libraries from other frameworks and systems, ThinkPHP specifically provides the function of importing third-party class libraries. Third-party class libraries are uniformly placed in the ThinkPHP system directory/

Use jquery.noConflict() to solve the problem of conflicts between jquery library and other librariesUse jquery.noConflict() to solve the problem of conflicts between jquery library and other libraries

20Jun2017

When developing with jQuery, you may also use other JS libraries, such as Prototype, but conflicts may occur when multiple libraries coexist; if conflicts occur, you can solve them through the following solutions: 1. jQuery libraries in other Import the library before and use the jQuery (callback) method directly such as:

What are linux dependency packagesWhat are linux dependency packages

24Mar2023

Linux dependency packages refer to "library files". Most dependency packages are library files, including dynamic libraries and static libraries. Linux systems, like other operating systems, are modular in design, which means that functions depend on each other, and some Functions require some other functions to support them, which can improve code reusability.

How to use pip tool in pythonHow to use pip tool in python

02Jul2019

After installing python, if you need to install some other libraries, there are generally two methods. One is to manually download and install them from the official website of each library; the other method is to install pip. Using pip can easily install various python libraries. library. After installing pip, you can directly install and delete third-party libraries through commands.

See all articles