Home > Article > Backend Development > php curl simulates login and obtains data instance
cURL is a powerful PHP library. Using PHP's cURL library can simply and effectively crawl web pages and collect content. Set cookies to simulate logging in to web pages. Curl provides a wealth of functions. Developers can learn from the PHP manual. Get more information about cURL. This article takes simulated login to open source China (oschina) as an example. Friends who need it can refer to
PHP’s curl() is relatively efficient in crawling web pages and supports multi-threading, while file_get_contents() The efficiency is slightly lower. Of course, you need to enable the curl extension when using curl.
Code actual combat
Let’s first look at the login part of the code:
//模拟登录 function login_post($url, $cookie, $post) { $curl = curl_init();//初始化curl模块 curl_setopt($curl, CURLOPT_URL, $url);//登录提交的地址 curl_setopt($curl, CURLOPT_HEADER, 0);//是否显示头信息 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 0);//是否自动显示返回的信息 curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie); //设置Cookie信息保存在指定的文件中 curl_setopt($curl, CURLOPT_POST, 1);//post方式提交 curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($post));//要提交的信息 curl_exec($curl);//执行cURL curl_close($curl);//关闭cURL资源,并且释放系统资源 }
Function login_post( ) First initialize curl_init(), then use curl_setopt() to set relevant option information, including the url address to be submitted, saved cookie files, post data (user name and password and other information), whether to return information, etc., and then curl_exec executes curl , and finally curl_close() releases the resources. Note that PHP's own http_build_query() can convert arrays into connected strings.
Next, if the login is successful, we need to obtain the page information after the login is successful.
//登录成功后获取数据 function get_content($url, $cookie) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //读取cookie $rs = curl_exec($ch); //执行cURL抓取页面内容 curl_close($ch); return $rs; }
The function get_content() also initializes curl first, then sets relevant options, executes curl, and releases resources. Among them, we set CURLOPT_RETURNTRANSFER to 1 to automatically return information, and CURLOPT_COOKIEFILE can read the cookie information saved when logging in, and finally return the page content.
Our ultimate goal is to obtain the information after simulated login, which is useful information that can only be obtained after successful normal login. Next, we take logging into the mobile version of Open Source China as an example to see how to capture the information after successful login.
//设置post的数据 $post = array ( 'email' => 'oschina账户', 'pwd' => 'oschina密码', 'goto_page' => '/my', 'error_page' => '/login', 'save_login' => '1', 'submit' => '现在登录' ); //登录地址 $url = "http://m.oschina.net/action/user/login"; //设置cookie保存路径 $cookie = dirname(__FILE__) . '/cookie_oschina.txt'; //登录后要获取信息的地址 $url2 = "http://m.oschina.net/my"; //模拟登录 login_post($url, $cookie, $post); //获取登录页的信息 $content = get_content($url2, $cookie); //删除cookie文件 @ unlink($cookie); //匹配页面信息 $preg = "/<td class='portrait'>(.*)<\/td>/i"; preg_match_all($preg, $content, $arr); $str = $arr[1][0]; //输出内容 echo $str;
Usage summary
1. Initialize curl;
2. Use curl_setopt to set the target url, and other options;
3. curl_exec, execute curl;
4. After execution, close curl;
5. Output data.
The above is the entire content of this article, I hope it will be helpful to everyone's study.
Related recommendations:
node is based on puppeteerSimulated loginDetailed explanation of the crawling steps
PHP uses Curl to implement Simulated login and detailed steps to capture data
puppeteerSimulated loginCapture Get the implementation code of the page
The above is the detailed content of php curl simulates login and obtains data instance. For more information, please follow other related articles on the PHP Chinese website!