Rumah >php教程 >php手册 >php中实现的一个curl批处理的实例 - 猿客

php中实现的一个curl批处理的实例 - 猿客

WBOY
WBOYasal
2016-05-20 11:39:221052semak imbas

curl是利用URL语法在命令行方式下工作的开源文件传输工具

本文在php中实现了的一个curl批处理的实例。

代码如下:

<span style="color: #008080;">  1</span> <span style="color: #008080;">header</span>("Content-Type:text/html;charset=utf8"<span style="color: #000000;">);
</span><span style="color: #008080;">  2</span> 
<span style="color: #008080;">  3</span> <span style="color: #008000;">/*</span><span style="color: #008000;"> 先获取两个页面的所有a标签 </span><span style="color: #008000;">*/</span>
<span style="color: #008080;">  4</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 初始化两个简单处理句柄</span>
<span style="color: #008080;">  5</span> <span style="color: #800080;">$ch1</span> =<span style="color: #000000;"> curl_init();
</span><span style="color: #008080;">  6</span> <span style="color: #800080;">$ch2</span> =<span style="color: #000000;"> curl_init();
</span><span style="color: #008080;">  7</span> curl_setopt_array(<span style="color: #800080;">$ch1</span>,<span style="color: #0000ff;">array</span><span style="color: #000000;">(
</span><span style="color: #008080;">  8</span>     CURLOPT_URL => 'http://www.sina.com.cn',
<span style="color: #008080;">  9</span>     CURLOPT_HEADER => 0,
<span style="color: #008080;"> 10</span>     CURLOPT_RETURNTRANSFER => 1,
<span style="color: #008080;"> 11</span> <span style="color: #000000;">));
</span><span style="color: #008080;"> 12</span> curl_setopt_array(<span style="color: #800080;">$ch2</span>,<span style="color: #0000ff;">array</span><span style="color: #000000;">(
</span><span style="color: #008080;"> 13</span>     CURLOPT_URL => 'http://www.baidu.com/',
<span style="color: #008080;"> 14</span>     CURLOPT_HEADER => 0,
<span style="color: #008080;"> 15</span>     CURLOPT_RETURNTRANSFER => 1,
<span style="color: #008080;"> 16</span> <span style="color: #000000;">));
</span><span style="color: #008080;"> 17</span> 
<span style="color: #008080;"> 18</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 初始化批处理句柄,并添加简单处理句柄</span>
<span style="color: #008080;"> 19</span> <span style="color: #800080;">$mh</span> =<span style="color: #000000;"> curl_multi_init();
</span><span style="color: #008080;"> 20</span> curl_multi_add_handle(<span style="color: #800080;">$mh</span>,<span style="color: #800080;">$ch1</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 21</span> curl_multi_add_handle(<span style="color: #800080;">$mh</span>,<span style="color: #800080;">$ch2</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 22</span> 
<span style="color: #008080;"> 23</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 初始化执行状态</span>
<span style="color: #008080;"> 24</span> <span style="color: #800080;">$state</span> = <span style="color: #0000ff;">null</span><span style="color: #000000;">;
</span><span style="color: #008080;"> 25</span> 
<span style="color: #008080;"> 26</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 执行批处理</span>
<span style="color: #008080;"> 27</span> <span style="color: #0000ff;">do</span><span style="color: #000000;">{
</span><span style="color: #008080;"> 28</span>     <span style="color: #800080;">$mc</span> = curl_multi_exec(<span style="color: #800080;">$mh</span>,<span style="color: #800080;">$state</span><span style="color: #000000;">);    
</span><span style="color: #008080;"> 29</span> }<span style="color: #0000ff;">while</span>(<span style="color: #800080;">$mc</span> ==<span style="color: #000000;"> CURLM_CALL_MULTI_PERFORM);
</span><span style="color: #008080;"> 30</span> <span style="color: #0000ff;">while</span>(<span style="color: #800080;">$mc</span> == CURLM_OK && <span style="color: #800080;">$state</span><span style="color: #000000;">) {
</span><span style="color: #008080;"> 31</span>     <span style="color: #0000ff;">while</span> (curl_multi_exec(<span style="color: #800080;">$mh</span>, <span style="color: #800080;">$state</span>) ===<span style="color: #000000;"> CURLM_CALL_MULTI_PERFORM);
</span><span style="color: #008080;"> 32</span>     <span style="color: #008000;">//</span><span style="color: #008000;"> 经过实验,发现curl_multi_select($mh)总是返回-1,意味着一下代码不会执行</span>
<span style="color: #008080;"> 33</span>     <span style="color: #0000ff;">if</span>(curl_multi_select(<span style="color: #800080;">$mh</span>) != -1<span style="color: #000000;">) {
</span><span style="color: #008080;"> 34</span>         <span style="color: #0000ff;">do</span><span style="color: #000000;">{
</span><span style="color: #008080;"> 35</span>             <span style="color: #800080;">$mc</span> = curl_multi_exec(<span style="color: #800080;">$mh</span>,<span style="color: #800080;">$state</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 36</span>         }<span style="color: #0000ff;">while</span>(<span style="color: #800080;">$mc</span> ==<span style="color: #000000;"> CURLM_CALL_MULTI_PERFORM);
</span><span style="color: #008080;"> 37</span> <span style="color: #000000;">    }    
</span><span style="color: #008080;"> 38</span> <span style="color: #000000;">}
</span><span style="color: #008080;"> 39</span> 
<span style="color: #008080;"> 40</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 获取内容</span>
<span style="color: #008080;"> 41</span> <span style="color: #800080;">$text</span>  = curl_multi_getcontent(<span style="color: #800080;">$ch1</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 42</span> <span style="color: #800080;">$text</span> .= curl_multi_getcontent(<span style="color: #800080;">$ch2</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 43</span> 
<span style="color: #008080;"> 44</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 找到页面中所有的a标签,保存到$matches</span>
<span style="color: #008080;"> 45</span> <span style="color: #800080;">$matches</span> = <span style="color: #0000ff;">null</span><span style="color: #000000;">;
</span><span style="color: #008080;"> 46</span> <span style="color: #008080;">preg_match_all</span>("/<a.>(.*?)/",<span style="color: #800080;">$text</span>,<span style="color: #800080;">$matches</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 47</span> 
<span style="color: #008080;"> 48</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 关闭各个句柄</span>
<span style="color: #008080;"> 49</span> curl_multi_remove_handle(<span style="color: #800080;">$mh</span>,<span style="color: #800080;">$ch1</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 50</span> curl_multi_remove_handle(<span style="color: #800080;">$mh</span>,<span style="color: #800080;">$ch2</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 51</span> curl_multi_close(<span style="color: #800080;">$mh</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 52</span> 
<span style="color: #008080;"> 53</span> <span style="color: #008000;">/*</span><span style="color: #008000;">在找到的连接中继续查找title标签 </span><span style="color: #008000;">*/</span>
<span style="color: #008080;"> 54</span> 
<span style="color: #008080;"> 55</span> <span style="color: #800080;">$handle</span> = <span style="color: #0000ff;">array</span>(); <span style="color: #008000;">//</span><span style="color: #008000;"> 存储简单处理句柄的数组</span>
<span style="color: #008080;"> 56</span> <span style="color: #800080;">$mhandle</span> = curl_multi_init(); <span style="color: #008000;">//</span><span style="color: #008000;">批处理句柄
</span><span style="color: #008080;"> 57</span> <span style="color: #008000;">// 处理100个页面</span>
<span style="color: #008080;"> 58</span> <span style="color: #0000ff;">foreach</span>(<span style="color: #008080;">array_slice</span>(<span style="color: #800080;">$matches</span>[1],0,100) <span style="color: #0000ff;">as</span> <span style="color: #800080;">$href</span><span style="color: #000000;">) {
</span><span style="color: #008080;"> 59</span>     <span style="color: #800080;">$tmp_h</span> =<span style="color: #000000;"> curl_init();
</span><span style="color: #008080;"> 60</span>     curl_setopt_array(<span style="color: #800080;">$tmp_h</span>,<span style="color: #0000ff;">array</span><span style="color: #000000;">(
</span><span style="color: #008080;"> 61</span>         CURLOPT_URL => <span style="color: #800080;">$href</span>,
<span style="color: #008080;"> 62</span>         CURLOPT_HEADER => 0,
<span style="color: #008080;"> 63</span>         CURLOPT_RETURNTRANSFER => 1,
<span style="color: #008080;"> 64</span> <span style="color: #000000;">    ));
</span><span style="color: #008080;"> 65</span>     curl_multi_add_handle(<span style="color: #800080;">$mhandle</span>,<span style="color: #800080;">$tmp_h</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 66</span>     <span style="color: #800080;">$handle</span>[] = <span style="color: #800080;">$tmp_h</span><span style="color: #000000;">;
</span><span style="color: #008080;"> 67</span> <span style="color: #000000;">}
</span><span style="color: #008080;"> 68</span> <span style="color: #0000ff;">do</span><span style="color: #000000;">{
</span><span style="color: #008080;"> 69</span>     <span style="color: #800080;">$mrc</span> = curl_multi_exec(<span style="color: #800080;">$mhandle</span>,<span style="color: #800080;">$active</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 70</span> }<span style="color: #0000ff;">while</span>(<span style="color: #800080;">$mrc</span> ==<span style="color: #000000;"> CURLM_CALL_MULTI_PERFORM);
</span><span style="color: #008080;"> 71</span> <span style="color: #0000ff;">while</span>(<span style="color: #800080;">$mrc</span> == CURLM_OK && <span style="color: #800080;">$active</span><span style="color: #000000;">) {
</span><span style="color: #008080;"> 72</span>     <span style="color: #0000ff;">while</span>(curl_multi_exec(<span style="color: #800080;">$mhandle</span>,<span style="color: #800080;">$active</span>) ==<span style="color: #000000;"> CURLM_CALL_MULTI_PERFORM);
</span><span style="color: #008080;"> 73</span>     <span style="color: #0000ff;">if</span>(curl_multi_select(<span style="color: #800080;">$mhandle</span>) != -1<span style="color: #000000;">) {
</span><span style="color: #008080;"> 74</span>         <span style="color: #0000ff;">do</span><span style="color: #000000;">{
</span><span style="color: #008080;"> 75</span>             <span style="color: #800080;">$mrc</span> = curl_multi_exec(<span style="color: #800080;">$mhandle</span>,<span style="color: #800080;">$active</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 76</span>         }<span style="color: #0000ff;">while</span>(<span style="color: #800080;">$mrc</span> ==<span style="color: #000000;"> CURLM_CALL_MULTI_PERFORM);
</span><span style="color: #008080;"> 77</span> <span style="color: #000000;">    }
</span><span style="color: #008080;"> 78</span> <span style="color: #000000;">}
</span><span style="color: #008080;"> 79</span> 
<span style="color: #008080;"> 80</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 获取这些页面的内容</span>
<span style="color: #008080;"> 81</span> <span style="color: #800080;">$mtext</span> = <span style="color: #0000ff;">null</span><span style="color: #000000;">;
</span><span style="color: #008080;"> 82</span> <span style="color: #0000ff;">foreach</span>(<span style="color: #800080;">$handle</span> <span style="color: #0000ff;">as</span> <span style="color: #800080;">$tmp_h</span><span style="color: #000000;">) {
</span><span style="color: #008080;"> 83</span>     <span style="color: #800080;">$mtext</span> .= curl_multi_getcontent(<span style="color: #800080;">$tmp_h</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 84</span>     curl_multi_remove_handle(<span style="color: #800080;">$mhandle</span>, <span style="color: #800080;">$tmp_h</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 85</span> <span style="color: #000000;">}
</span><span style="color: #008080;"> 86</span> <span style="color: #800080;">$mmatches</span> = <span style="color: #0000ff;">array</span><span style="color: #000000;">();
</span><span style="color: #008080;"> 87</span> <span style="color: #008080;">preg_match_all</span>("/<title>(.*?)/",<span style="color: #800080;">$mtext</span>, <span style="color: #800080;">$mmatches</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 88</span> 
<span style="color: #008080;"> 89</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 编码转换</span>
<span style="color: #008080;"> 90</span> mb_detect_order('GB2312,GBK,BIG5,GB18030,UNICODE ,CP936'<span style="color: #000000;">);
</span><span style="color: #008080;"> 91</span> <span style="color: #0000ff;">foreach</span>(<span style="color: #800080;">$mmatches</span>[1] <span style="color: #0000ff;">as</span> <span style="color: #800080;">$key</span> => <span style="color: #800080;">$val</span><span style="color: #000000;">) {
</span><span style="color: #008080;"> 92</span>     <span style="color: #800080;">$encoding</span> = mb_detect_encoding(<span style="color: #800080;">$val</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 93</span>     <span style="color: #0000ff;">if</span>(<span style="color: #800080;">$encoding</span> != 'UTF-8' && <span style="color: #800080;">$encoding</span> != 'CP936' && <span style="color: #800080;">$encoding</span> != 'GB18030' && <span style="color: #800080;">$encoding</span> !=''<span style="color: #000000;">) {
</span><span style="color: #008080;"> 94</span>         <span style="color: #800080;">$mmatches</span>[1][<span style="color: #800080;">$key</span>] = <span style="color: #008080;">iconv</span>(<span style="color: #800080;">$encoding</span>,'UTF-8//IGNORE',<span style="color: #800080;">$val</span><span style="color: #000000;">);
</span><span style="color: #008080;"> 95</span> <span style="color: #000000;">    }
</span><span style="color: #008080;"> 96</span> <span style="color: #000000;">}
</span><span style="color: #008080;"> 97</span> 
<span style="color: #008080;"> 98</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 打印title信息</span>
<span style="color: #008080;"> 99</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$mmatches</span>[1<span style="color: #000000;">]);
</span><span style="color: #008080;">100</span> 
<span style="color: #008080;">101</span> <span style="color: #008000;">//</span><span style="color: #008000;"> 关闭批处理句柄</span>
<span style="color: #008080;">102</span> curl_multi_close(<span style="color: #800080;">$mhandle</span>);</title></a.>

 

Kenyataan:
Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn