Home >Backend Development >PHP Tutorial >One of the tutorials related to PHP collection: CURL function library_PHP tutorial

One of the tutorials related to PHP collection: CURL function library_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:40:54719browse

First write a simple page capture function

Copy the code The code is as follows:

function GetSources($Url,$User_Agent='',$Referer_Url='') //Catch a specified page
{
//$Url The page address to be crawled
// $User_Agent needs to return user_agent information such as "baiduspider" or "googlebot"
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $Url);
curl_setopt ($ch, CURLOPT_USERAGENT, $ User_Agent);
curl_setopt ($ch, CURLOPT_REFERER, $Referer_Url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$MySources = curl_exec ($ch);
curl_close($ch);
return $MySources;
}
$Url = "http://www.jb51.net"; //You also want to get the content No
$User_Agent = "baiduspider+(+http://www.baidu.com/search/spider.htm)";
$Referer_Url = 'http://www.jb51.net/';
echo GetSources($Url,$User_Agent,$Referer_Url);
?>




CURL function library in PHP (Client URL Library Function)
curl_close — close a curl session;
curl_copy_handle — copy all contents and parameters of a curl connection resource;
curl_errno — return a numeric number containing error information for the current session;
curl_error — return a message containing String of current session error information;
curl_exec — execute a curl session;
curl_getinfo — obtain information about a curl connection resource handle;
curl_init — initialize a curl session;
curl_multi_add_handle — batch batch to curl Add a separate curl handle resource to the processing session;
curl_multi_close — close a batch handle resource;
curl_multi_exec — parse a curl batch handle;
curl_multi_getcontent — return the text stream of the obtained output;
curl_multi_info_read — Get the relevant transmission information of the currently parsed curl;
curl_multi_init — Initialize a curl batch handle resource;
curl_multi_remove_handle — Remove a handle resource in the curl batch handle resource;
curl_multi_select — Get all the sockets associated with the cURL extension, which can then be "selected";
curl_setopt_array — Set session parameters for a curl in the form of an array;
curl_setopt — Set session parameters for a curl;
curl_version — Get curl-related version information;
The role of the curl_init() function is to initialize a curl session. The only parameter of the curl_init() function is optional and represents a URL address;
The role of the curl_exec() function is to execute A curl session, the only parameter is the handle returned by the curl_init() function;
The role of the curl_close() function is to close a curl session, the only parameter is the handle returned by the curl_init() function;
PHP code
Copy code The code is as follows:

$ch = curl_init("http://blog.huangchao .org/");
curl_exec($ch);
curl_close($ch);
?>


The function of curl_version() function is to obtain Curl related version information, the curl_version() function has a parameter, it is unclear what it does;
PHP code
print_r(curl_version())
?>
The function of the curl_getinfo() function is to obtain information about a curl connection resource handle. The curl_getinfo() function has two parameters. The first parameter is the resource handle of curl, and the second parameter is the following constants:
PHP code
Copy code The code is as follows:

$ch = curl_init("http://blog .huangchao.org/");
print_r(curl_getinfo($ch));
?>

Optional constants include:
CURLINFO_EFFECTIVE_URL: the last valid url address;
CURLINFO_HTTP_CODE: the last received HTTP code;
CURLINFO_FILETIME: the time to obtain the document remotely, if it cannot be obtained, then The return value is "-1";
CURLINFO_TOTAL_TIME: The time spent in the last transmission;
CURLINFO_NAMELOOKUP_TIME: The time spent in name resolution;
CURLINFO_CONNECT_TIME: The time spent in establishing a connection;
CURLINFO_PRETRANSFER_TIME: The time it takes from establishing the connection to preparing the transfer;
CURLINFO_STARTTRANSFER_TIME: The time it takes from establishing the connection to the start of the transfer;
CURLINFO_REDIRECT_TIME: The time it takes to redirect before the transaction transfer starts;
CURLINFO_SIZE_UPLOAD: Upload The total value of data volume;


CURLINFO_SIZE_DOWNLOAD: the total value of downloaded data volume;
CURLINFO_SPEED_DOWNLOAD: average download speed;
CURLINFO_SPEED_UPLOAD: average upload speed;
CURLINFO_HEADER_SIZE: header part Size;
CURLINFO_HEADER_OUT: The string to send the request;
CURLINFO_REQUEST_SIZE: The size of the request in question in the HTTP request;
CURLINFO_SSL_VERIFYRESULT: Result of SSL certification verification requested by setting CURLOPT_SSL_VERIFYPEER; WNLOAD: from Content-Length: The length of the downloaded content read in the field;
CURLINFO_CONTENT_LENGTH_UPLOAD: Description of the size of the uploaded content;
CURLINFO_CONTENT_TYPE: The "Content-type" value of the downloaded content, NULL means that the server did not send a valid "Content-Type" : header";
The curl_setopt() function is used to set session parameters for a curl. The function of curl_setopt_array() is to set session parameters for a curl in the form of an array;
PHP code

Copy code The code is as follows:
$ch = curl_init();
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp );
$options = array(
CURLOPT_URL => 'http://www.baidu.com/',
CURLOPT_HEADER => false
);
curl_setopt_array($ch , $options);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>


The parameters that can be set are:
CURLOPT_AUTOREFERER: Automatically set the referer information in the header;
CURLOPT_BINARYTRANSFER: When CURLOPT_RETURNTRANSFER is enabled, data will be obtained and returned;
CURLOPT_COOKIESESSION: When enabled, curl will only pass a session cookie , ignoring other cookies, curl will return all cookies to the server by default. Session cookies refer to those cookies that exist to determine whether the server-side session is valid;


CURLOPT_CRLF: When enabled, Unix line feeds are converted into carriage returns and line feeds;
CURLOPT_DNS_USE_GLOBAL_CACHE: When enabled A global DNS cache will be enabled, which is thread-safe and defaults to true;
CURLOPT_FAILONERROR: Displays the HTTP status code. The default behavior is to ignore HTTP information with a number less than or equal to 400;
CURLOPT_FILETIME: When enabled, it will Attempt to modify information in the remote document. The result information will be returned through the CURLINFO_FILETIME option of the curl_getinfo() function;
CURLOPT_FOLLOWLOCATION: When enabled, the "Location:" returned by the server will be placed in the header and returned to the server recursively. Use CURLOPT_MAXREDIRS to limit the number of recursive returns;
CURLOPT_FORBID_REUSE: Force the connection to be disconnected after completing the interaction and cannot be reused;
CURLOPT_FRESH_CONNECT: Force to obtain a new connection to replace the connection in the cache;
CURLOPT_FTP_USE_EPRT: TRUE to use EPRT (and LPRT) when doing active FTP downloads. Use FALSE to disable EPRT and LPRT and use PORT only; Added in PHP 5.0.0.
CURLOPT_FTP_USE_EPSV: TRUE to first try an EPSV command for FTP transfers before reverting back to PASV. Set to FALSE to disable EPSV;
CURLOPT_FTPAPPEND: TRUE to append to the remote file instead of overwriting it;
CURLOPT_FTPASCII: An alias of CURLOPT_TRANSFERTEXT. Use that instead;
CURLOPT_FTPLISTONLY: TRUE to only list the names of an FTP directory; CURLOPT_HEADER: When enabled, the header file information will be output as a data stream;
CURLOPT_HTTPGET: When enabled, the HTTP method will be set to GET, because GET is the default, so it is only used when it is modified;
CURLOPT_HTTPPROXYTUNNEL : When enabled, it will be transmitted through the HTTP proxy;
CURLOPT_MUTE: All modified parameters in the curl function are restored to their default values;
CURLOPT_NETRC: After the connection is established, access the ~/.netrc file to obtain the user name and password information connection Remote site;
CURLOPT_NOBODY: When enabled, the body part in HTML will not be output;
CURLOPT_NOPROGRESS: When enabled, the progress bar of curl transmission is turned off. The default setting of this item is true;
CURLOPT_NOSIGNAL: Ignored when enabled All curl passes signals to PHP.在SAPI多线程传输时此项被默认打开;
CURLOPT_POST:启用时会发送一个常规的POST请求,类型为:application/x-www-form-urlencoded,就像表单提交的一样;
CURLOPT_PUT:启用时允许HTTP发送文件,必须同时设置CURLOPT_INFILE和CURLOPT_INFILESIZE
CURLOPT_RETURNTRANSFER:将curl_exec()获取的信息以文件流的形式返回,而不是直接输出;


CURLOPT_SSL_VERIFYPEER:FALSE to stop cURL from verifying the peer's certificate. Alternate certificates to verify against can be specified with the CURLOPT_CAINFO option or a certificate directory can be specified with the CURLOPT_CAPATH option. CURLOPT_SSL_VERIFYHOST may also need to be TRUE or FALSE if CURLOPT_SSL_VERIFYPEER is disabled (it defaults to 2). TRUE by default as of cURL 7.10. Default bundle installed as of cURL 7.10;
CURLOPT_TRANSFERTEXT:TRUE to use ASCII mode for FTP transfers. For LDAP, it retrieves data in plain text instead of HTML. On Windows systems, it will not set STDOUT to binary mode;
CURLOPT_UNRESTRICTED_AUTH:在使用CURLOPT_FOLLOWLOCATION产生的header中的多个locations中持续追加用户名和密码信息,即使域名已发生改变;
CURLOPT_UPLOAD:启用时允许文件传输;
CURLOPT_VERBOSE:启用时会汇报所有的信息,存放在STDERR或指定的CURLOPT_STDERR中;
CURLOPT_BUFFERSIZE:每次获取的数据中读入缓存的大小,这个值每次都会被填满;
CURLOPT_CLOSEPOLICY:不是CURLCLOSEPOLICY_LEAST_RECENTLY_USED就是CURLCLOSEPOLICY_OLDEST,还存在另外三个,但是curl暂时还不支持;
CURLOPT_CONNECTTIMEOUT:在发起连接前等待的时间,如果设置为0,则不等待;
CURLOPT_DNS_CACHE_TIMEOUT:设置在内存中保存DNS信息的时间,默认为120秒;
CURLOPT_FTPSSLAUTH:The FTP authentication method (when is activated): CURLFTPAUTH_SSL (try SSL first), CURLFTPAUTH_TLS (try TLS first), or CURLFTPAUTH_DEFAULT (let cURL decide);
CURLOPT_HTTP_VERSION:设置curl使用的HTTP协议,CURL_HTTP_VERSION_NONE(让curl自己判断),CURL_HTTP_VERSION_1_0(HTTP/1.0),CURL_HTTP_VERSION_1_1(HTTP/1.1);
CURLOPT_HTTPAUTH:使用的HTTP验证方法,可选的值有:CURLAUTH_BASIC,CURLAUTH_DIGEST,CURLAUTH_GSSNEGOTIATE,CURLAUTH_NTLM,CURLAUTH_ANY,CURLAUTH_ANYSAFE,可以使用“|”操作符分隔多个值,curl让服务器选择一个支持最好的值,CURLAUTH_ANY等价于CURLAUTH_BASIC | CURLAUTH_DIGEST | CURLAUTH_GSSNEGOTIATE | CURLAUTH_NTLM,CURLAUTH_ANYSAFE等价于CURLAUTH_DIGEST | CURLAUTH_GSSNEGOTIATE | CURLAUTH_NTLM
CURLOPT_INFILESIZE:设定上传文件的大小;
CURLOPT_LOW_SPEED_LIMIT:当传输速度小于CURLOPT_LOW_SPEED_LIMIT时,PHP会根CURLOPT_LOW_SPEED_TIME来判断是否因太慢而取消传输;
CURLOPT_LOW_SPEED_TIME:The number of seconds the transfer should be below CURLOPT_LOW_SPEED_LIMIT for PHP to consider the transfer too slow and abort;
当传输速度小于CURLOPT_LOW_SPEED_LIMIT时,PHP会根据CURLOPT_LOW_SPEED_TIME来判断是否因太慢而取消传输;
CURLOPT_MAXCONNECTS:允许的最大连接数量,超过是会通过CURLOPT_CLOSEPOLICY决定应该停止哪些连接;
CURLOPT_MAXREDIRS:指定最多的HTTP重定向的数量,这个选项是和CURLOPT_FOLLOWLOCATION一起使用的;


CURLOPT_PORT:一个可选的用来指定连接端口的量;
CURLOPT_PROXYAUTH:The HTTP authentication method(s) to use for the proxy connection. Use the same bitmasks as described in CURLOPT_HTTPAUTH. For proxy authentication, only CURLAUTH_BASIC and CURLAUTH_NTLM are currently supported.
CURLOPT_PROXYPORT:The port number of the proxy to connect to. This port number can also be set in CURLOPT_PROXY.
CURLOPT_PROXYTYPE:Either CURLPROXY_HTTP (default) or CURLPROXY_SOCKS5.
CURLOPT_RESUME_FROM:在恢复传输时传递一个字节偏移量(用来断点续传)
CURLOPT_SSL_VERIFYHOST:
1 to check the existence of a common name in the SSL peer certificate.
2 to check the existence of a common name and also verify that it matches the hostname provided.
CURLOPT_SSLVERSION:The SSL version (2 or 3) to use. By default PHP will try to determine this itself, although in some cases this must be set manually.
CURLOPT_TIMECONDITION:如果在CURLOPT_TIMEVALUE指定的某个时间以后被编辑过,则使用CURL_TIMECOND_IFMODSINCE返回页面,如果没有被修改过,并且CURLOPT_HEADER为true,则返回一个"304 Not Modified"的header,CURLOPT_HEADER为false,则使用CURL_TIMECOND_ISUNMODSINCE,默认值为CURL_TIMECOND_IFMODSINCE
CURLOPT_TIMEOUT:设置curl允许执行的最长秒数
CURLOPT_TIMEVALUE:设置一个CURLOPT_TIMECONDITION使用的时间戳,在默认状态下使用的是CURL_TIMECOND_IFMODSINCE
CURLOPT_CAINFO: The name of a file holding one or more certificates to verify the peer with. This only makes sense when used in combination with CURLOPT_SSL_VERIFYPEER.
CURLOPT_CAPATH: A directory that holds multiple CA certificates. Use this option alongside CURLOPT_SSL_VERIFYPEER.
CURLOPT_COOKIE: Set the content of the "Set-Cookie:" part of the HTTP request.
CURLOPT_COOKIEFILE: The name of the file containing cookie information. This cookie file can be Netscape format or HTTP style header information.
CURLOPT_COOKIEJAR: After the connection is closed, the file name to store cookie information


CURLOPT_CUSTOMREQUEST: A custom request method to use instead of "GET" or "HEAD" when doing a HTTP request. This is useful for doing "DELETE" or other, more obscure HTTP requests. Valid values ​​are things like "GET", "POST", "CONNECT" and so on; i.e. Do not enter a whole HTTP request line here. For instance, entering "GET /index.html HTTP/1.0rnrn" would be incorrect.
Note: Don't do this without making sure the server supports the custom request method first.
CURLOPT_EGBSOCKET: Like CURLOPT_RANDOM_FILE, except a filename to an Entropy Gathering Daemon socket.
CURLOPT_ENCODING: The content of the "Accept-Encoding:" part in the header. The supported encoding formats are: "identity", "deflate", "gzip". If set to an empty string, it means that all encoding formats are supported
CURLOPT_FTPPORT: The value which will be used to get the IP address to use for the FTP "POST" instruction. The "POST" instruction tells the remote server to connect to our specified IP address. The string may be a plain IP address, a hostname, a network interface name (under Unix), or just a plain '-' to use the systems default IP address.
CURLOPT_INTERFACE: External The name used in the network interface, which can be an interface name, IP or host name.
CURLOPT_KRB4LEVEL: KRB4 (Kerberos 4) security level setting, which can be one of the following values: "clear", "safe", "confidential", "private". The default value is "private". When set to null, KRB4 is disabled. Now KRB4 security can only be used in FTP transmission.
CURLOPT_POSTFIELDS: "POST" operation in HTTP. If you want to transfer a file, you need a file name starting with @
CURLOPT_PROXY: Set the HTTP proxy server passed
CURLOPT_PROXYUSERPWD: Connect to the proxy server, the user name and password in the format of "[username]:[password]" .
CURLOPT_RANDOM_FILE: Set the file name to store the random number seed used by SSL
CURLOPT_RANGE: Set the HTTP transmission range. You can set a transmission range in the form of "X-Y". If there are multiple HTTP transmissions, use commas Separate multiple values, in the form: "X-Y,N-M".
CURLOPT_REFERER: Set the value of the "Referer: " part in the header.
CURLOPT_SSL_CIPHER_LIST: A list of ciphers to use for SSL. For example, RC4-SHA and TLSv1 are valid cipher lists.
CURLOPT_SSLCERT: Pass a string containing a PEM formatted certificate.


CURLOPT_SSLCERTPASSWD: Pass a password containing the necessary password to use the CURLOPT_SSLCERT certificate.
CURLOPT_SSLCERTTYPE: The format of the certificate. Supported formats are "PEM" (default), "DER", and "ENG".
CURLOPT_SSLENGINE: The identifier for the crypto engine of the private SSL key specified in CURLOPT_SSLKEY.
CURLOPT_SSLENGINE_DEFAULT: The identifier for the crypto engine used for asymmetric crypto operations.
CURLOPT_SSLKEY: The name of a file containing a private SSL key.
CURLOPT_SSLKEYPASSWD: The secret password needed to use the private SSL key specified in CURLOPT_SSLKEY.
Note: Since this option contains a sensitive password, remember to keep the PHP script it is contained within safe.
CURLOPT_SSLKEYTYPE: The key type of the private SSL key specified in CURLOPT_SSLKEY. Supported key types are "PEM " (default), "DER", and "ENG".
CURLOPT_URL: The URL address to be obtained can also be set in PHP's curl_init() function.
CURLOPT_USERAGENT: A string containing a "user-agent" header in the HTTP request.
CURLOPT_USERPWD: Pass the username and password required for a connection, in the format: "[username]:[password]".
CURLOPT_HTTP200ALIASES: Set to no longer process HTTP 200 responses in the form of error, the format is an array.
CURLOPT_HTTPHEADER: Set an array of transmission content in the header.
CURLOPT_POSTQUOTE: An array of FTP commands to execute on the server after the FTP request has been performed.
CURLOPT_QUOTE: An array of FTP commands to execute on the server prior to the FTP request.
CURLOPT_FILE: Set The location of the output file. The value is a resource type. The default is STDOUT (browser).
CURLOPT_INFILE: The file address that needs to be read when uploading files. The value is a resource type.
CURLOPT_STDERR: Set an error output address, the value is a resource type, replacing the default STDERR.
CURLOPT_WRITEHEADER: Set the file address where the header part is written, and the value is a resource type.
CURLOPT_HEADERFUNCTION: Set a callback function. This function has two parameters. The first is the resource handle of curl, and the second is the output header data. The output of header data must rely on this function, which returns the size of the written data.
CURLOPT_PASSWDFUNCTION: Set a callback function with three parameters. The first is the curl resource handle, the second is a password prompt, and the third parameter is the maximum allowed password length. Returns the value of the password.
CURLOPT_READFUNCTION: Set a callback function with two parameters. The first is the resource handle of curl, and the second is the read data. Data reading must rely on this function. Returns the size of the read data, such as 0 or EOF.
CURLOPT_WRITEFUNCTION: Set a callback function with two parameters. The first is the resource handle of curl, and the second is the written data. Data writing must rely on this function. Return the exact size of the written data


The function of curl_copy_handle() function is to copy all the contents and parameters of a curl connection resource
PHP code
Copy code The code is as follows:

$ch = curl_init("http://qzone.myqq.us/");
$another = curl_copy_handle($ch);
curl_exec($another);
curl_close($another);
?>

The function of curl_error() is Returns a string containing error information for the current session.
The function of curl_errno() function is to return a numeric number containing the error information of the current session.
The curl_multi_init() function is used to initialize a curl batch handle resource.
The curl_multi_add_handle() function is used to add individual curl handle resources to the curl batch session. The curl_multi_add_handle() function has two parameters. The first parameter represents a curl batch handle resource, and the second parameter represents a separate curl handle resource.
The function of the curl_multi_exec() function is to parse a curl batch handle. The curl_multi_exec() function has two parameters. The first parameter represents a batch handle resource, and the second parameter is a reference value parameter, indicating the remaining The number of individual curl handle resources that need to be processed.
The curl_multi_remove_handle() function represents the removal of a handle resource in the curl batch handle resource. The curl_multi_remove_handle() function has two parameters. The first parameter represents a curl batch handle resource, and the second parameter represents a separate The curl handle resource.
The curl_multi_close() function is used to close a batch handle resource.
PHP code
Copy code The code is as follows:

$ch1 = curl_init() ;
$ch2 = curl_init();
curl_setopt($ch1, CURLOPT_URL, "http://blog.huangchao.org/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://test.huangchao.org/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
$mh = curl_multi_init();
curl_multi_add_handle( $mh,$ch1);
curl_multi_add_handle($mh,$ch2);
do {
curl_multi_exec($mh,$flag);
} while ($flag > 0);
curl_multi_remove_handle($mh,$ch1);
curl_multi_remove_handle($mh,$ch2);
curl_multi_close($mh);
?>

The function of curl_multi_getcontent() function is to return the obtained output text stream when CURLOPT_RETURNTRANSFER is set.
The curl_multi_info_read() function is used to obtain the relevant transmission information of the currently parsed curl.
curl_multi_select(): Get all the sockets associated with the cURL extension, which can then be "selected"

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/321298.htmlTechArticleFirst write a simple page capture function and copy the code as follows: ?php function GetSources($Url,$User_Agent ='',$Referer_Url='') //Catch a specified page{ //$Url needs to be crawled...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn