


PHP development: Use PHP to capture millions of Zhihu users and knowledge point notes
Code hosting address: https://github.com/hhqcontinue/zhihuSpider
Preparation before development
Install Linux system Ubuntu14.04), install an Ubuntu under the VMWare virtual machine;
Install PHP5.6 or above;
Install curl and pcntl extensions.
Use PHP’s curl extension to capture page data
PHP’s curl extension is a library supported by PHP that allows you to connect and communicate with various servers using various types of protocols.
This program captures Zhihu user data. To be able to access the user's personal page, the user needs to be logged in before accessing. When we click a user avatar link on the browser page to enter the user's personal center page, the reason why we can see the user's information is because when we click the link, the browser helps you bring the local cookies and submit them together. Go to a new page, so you can enter the user's personal center page. Therefore, before accessing the personal page, you need to obtain the user's cookie information, and then bring the cookie information with each curl request. In terms of obtaining cookie information, I used my own cookie. You can see your cookie information on the page:
Copy them one by one to form a cookie string in the form of "__utma=?;__utmb=?;". This cookie string can then be used to send requests.
Initial example:
<ol class="dp-j"><li class="alt"><span><span>$url = </span><span class="string">'http://www.zhihu.com/people/mora-hu/about'</span><span>; //此处mora-hu代表用户ID </span></span></li><li><span> $ch = curl_init($url); <span class="comment">//初始化会话</span><span> </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_HEADER, <span class="number">0</span><span>); </span></span></li><li><span> curl_setopt($ch, CURLOPT_COOKIE, $<span class="keyword">this</span><span>->config_arr[</span><span class="string">'user_cookie'</span><span>]); </span><span class="comment">//设置请求COOKIE</span><span> </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER[<span class="string">'HTTP_USER_AGENT'</span><span>]); </span></span></li><li><span> curl_setopt($ch, CURLOPT_RETURNTRANSFER, <span class="number">1</span><span>); </span><span class="comment">//将curl_exec()获取的信息以文件流的形式返回,而不是直接输出。</span><span> </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, <span class="number">1</span><span>); </span></span></li><li><span> $result = curl_exec($ch); </span></li><li class="alt"><span> <span class="keyword">return</span><span> $result; </span><span class="comment">//抓取的结果</span><span> </span></span></li></ol>
Run the above code to get the personal center page of mora-hu user. Using this result and then using regular expressions to process the page, you can obtain the name, gender and other information that needs to be captured.
Picture hotlink protection
When outputting personal information after regularizing the return results, it was found that the user's avatar cannot be opened when outputting it on the page. After reviewing the information, I found out that it was because Zhihu had protected the pictures from hotlinking. The solution is to forge a referer in the request header when requesting an image.
After using the regular expression to obtain the link to the image, send another request. At this time, bring the source of the image request, indicating that the request is forwarded from the Zhihu website. Specific examples are as follows:
<ol class="dp-j"><li class="alt"><span><span>function getImg($url, $u_id) </span></span></li><li><span>{ </span></li><li class="alt"><span> <span class="keyword">if</span><span> (file_exists(</span><span class="string">'./images/'</span><span> . $u_id . </span><span class="string">".jpg"</span><span>)) </span></span></li><li><span> { </span></li><li class="alt"><span> <span class="keyword">return</span><span> </span><span class="string">"images/$u_id"</span><span> . </span><span class="string">'.jpg'</span><span>; </span></span></li><li><span> } </span></li><li class="alt"><span> <span class="keyword">if</span><span> (empty($url)) </span></span></li><li><span> { </span></li><li class="alt"><span> <span class="keyword">return</span><span> </span><span class="string">''</span><span>; </span></span></li><li><span> } </span></li><li class="alt"><span> $context_options = array( </span></li><li><span> <span class="string">'http'</span><span> => </span></span></li><li class="alt"><span> array( </span></li><li><span> <span class="string">'header'</span><span> => </span><span class="string">"Referer:http://www.zhihu.com"</span><span>//带上referer参数 </span></span></li><li class="alt"><span> ) </span></li><li><span>); </span></li><li class="alt"><span> </span></li><li><span> $context = stream_context_create($context_options); </span></li><li class="alt"><span> $img = file_get_contents(<span class="string">'http:'</span><span> . $url, FALSE, $context); </span></span></li><li><span> file_put_contents(<span class="string">'./images/'</span><span> . $u_id . </span><span class="string">".jpg"</span><span>, $img); </span></span></li><li class="alt"><span> <span class="keyword">return</span><span> </span><span class="string">"images/$u_id"</span><span> . </span><span class="string">'.jpg'</span><span>; </span></span></li><li><span>} </span></li></ol>
After capturing your personal information, you need to access the user's followers and followed user lists to obtain more user information. Then visit layer by layer. As you can see, in the personal center page, there are two links as follows:
There are two links here, one is followed and the other is followers, taking the "followed" link as an example. Use regular matching to match the corresponding link. After getting the URL, use curl to bring the cookie and send another request. After crawling the list page that the user has followed, you can get the following page:
Analyze the HTML structure of the page. Because you only need to get the user's information, you only need to frame the div content, and the user name is in it. As you can see, the URL of the page that the user followed is:
This URL is almost the same for different users. The difference lies in the username. Use regular matching to get the list of user names, spell the URLs one by one, and then send requests one by one. Of course, one by one is slower. There is a solution below, which will be discussed later). After entering the new user's page, repeat the above steps, and continue in this loop until you reach the amount of data you want.
Number of Linux statistics files
After the script has been running for a while, you need to see how many pictures have been obtained. When the amount of data is relatively large, it is a bit slow to open the folder to check the number of pictures. The script is run in a Linux environment, so you can use Linux commands to count the number of files:
ls -l | grep "^-" | wc -l
其中, ls -l 是长列表输出该目录下的文件信息这里的文件可以是目录、链接、设备文件等); grep "^-" 过滤长列表输出信息, "^-" 只保留一般文件,如果只保留目录是 "^d" ; wc -l 是统计输出信息的行数。下面是一个运行示例:
插入MySQL时重复数据的处理
程序运行了一段时间后,发现有很多用户的数据是重复的,因此需要在插入重复用户数据的时候做处理。处理方案如下:
1)插入数据库之前检查数据是否已经存在数据库;
2)添加唯一索引,插入时使用 INSERT INTO ... ON DUPLICATE KEY UPDATE...
3)添加唯一索引,插入时使用 INSERT INGNORE INTO...
4)添加唯一索引,插入时使用 REPLACE INTO...
使用curl_multi实现多线程抓取页面
刚开始单进程而且单个curl去抓取数据,速度很慢,挂机爬了一个晚上只能抓到2W的数据,于是便想到能不能在进入新的用户页面发curl请求的时候一次性请求多个用户,后来发现了curl_multi这个好东西。curl_multi这类函数可以实现同时请求多个url,而不是一个个请求,这类似于linux系统中一个进程开多条线程执行的功能。下面是使用curl_multi实现多线程爬虫的示例:
<ol class="dp-j"><li class="alt"><span><span>$mh = curl_multi_init(); </span><span class="comment">//返回一个新cURL批处理句柄</span><span> </span></span></li><li><span> <span class="keyword">for</span><span> ($i = </span><span class="number">0</span><span>; $i < $max_size; $i++) </span></span></li><li class="alt"><span> { </span></li><li><span> $ch = curl_init(); <span class="comment">//初始化单个cURL会话</span><span> </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_HEADER, <span class="number">0</span><span>); </span></span></li><li><span> curl_setopt($ch, CURLOPT_URL, <span class="string">'http://www.zhihu.com/people/'</span><span> . $user_list[$i] . </span><span class="string">'/about'</span><span>); </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_COOKIE, self::$user_cookie); </span></li><li><span> curl_setopt($ch, CURLOPT_USERAGENT, <span class="string">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36'</span><span>); </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_RETURNTRANSFER, <span class="keyword">true</span><span>); </span></span></li><li><span> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, <span class="number">1</span><span>); </span></span></li><li class="alt"><span> $requestMap[$i] = $ch; </span></li><li><span> curl_multi_add_handle($mh, $ch); <span class="comment">//向curl批处理会话中添加单独的curl句柄</span><span> </span></span></li><li class="alt"><span> } </span></li><li><span> </span></li><li class="alt"><span> $user_arr = array(); </span></li><li><span> <span class="keyword">do</span><span> { </span></span></li><li class="alt"><span> <span class="comment">//运行当前 cURL 句柄的子连接</span><span> </span></span></li><li><span> <span class="keyword">while</span><span> (($cme = curl_multi_exec($mh, $active)) == CURLM_CALL_MULTI_PERFORM); </span></span></li><li class="alt"><span> </span></li><li><span> <span class="keyword">if</span><span> ($cme != CURLM_OK) {</span><span class="keyword">break</span><span>;} </span></span></li><li class="alt"><span> <span class="comment">//获取当前解析的cURL的相关传输信息</span><span> </span></span></li><li><span> <span class="keyword">while</span><span> ($done = curl_multi_info_read($mh)) </span></span></li><li class="alt"><span> { </span></li><li><span> $info = curl_getinfo($done[<span class="string">'handle'</span><span>]); </span></span></li><li class="alt"><span> $tmp_result = curl_multi_getcontent($done[<span class="string">'handle'</span><span>]); </span></span></li><li><span> $error = curl_error($done[<span class="string">'handle'</span><span>]); </span></span></li><li class="alt"><span> </span></li><li><span> $user_arr[] = array_values(getUserInfo($tmp_result)); </span></li><li class="alt"><span> </span></li><li><span> <span class="comment">//保证同时有$max_size个请求在处理</span><span> </span></span></li><li class="alt"><span> <span class="keyword">if</span><span> ($i < sizeof($user_list) && isset($user_list[$i]) && $i < count($user_list)) </span></span></li><li><span> { </span></li><li class="alt"><span> $ch = curl_init(); </span></li><li><span> curl_setopt($ch, CURLOPT_HEADER, <span class="number">0</span><span>); </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_URL, <span class="string">'http://www.zhihu.com/people/'</span><span> . $user_list[$i] . </span><span class="string">'/about'</span><span>); </span></span></li><li><span> curl_setopt($ch, CURLOPT_COOKIE, self::$user_cookie); </span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_USERAGENT, <span class="string">'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36'</span><span>); </span></span></li><li><span> curl_setopt($ch, CURLOPT_RETURNTRANSFER, <span class="keyword">true</span><span>); </span></span></li><li class="alt"><span> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, <span class="number">1</span><span>); </span></span></li><li><span> $requestMap[$i] = $ch; </span></li><li class="alt"><span> curl_multi_add_handle($mh, $ch); </span></li><li><span> </span></li><li class="alt"><span> $i++; </span></li><li><span> } </span></li><li class="alt"><span> </span></li><li><span> curl_multi_remove_handle($mh, $done[<span class="string">'handle'</span><span>]); </span></span></li><li class="alt"><span> } </span></li><li><span> </span></li><li class="alt"><span> <span class="keyword">if</span><span> ($active) </span></span></li><li><span> curl_multi_select($mh, <span class="number">10</span><span>); </span></span></li><li class="alt"><span> } <span class="keyword">while</span><span> ($active); </span></span></li><li><span> </span></li><li class="alt"><span> curl_multi_close($mh); </span></li><li><span> <span class="keyword">return</span><span> $user_arr; </span></span></li></ol>
HTTP 429 Too Many Requests
使用curl_multi函数可以同时发多个请求,但是在执行过程中使同时发200个请求的时候,发现很多请求无法返回了,即发现了丢包的情况。进一步分析,使用 curl_getinfo 函数打印每个请求句柄信息,该函数返回一个包含HTTP response信息的关联数组,其中有一个字段是http_code,表示请求返回的HTTP状态码。看到有很多个请求的http_code都是429,这个返回码的意思是发送太多请求了。我猜是知乎做了防爬虫的防护,于是我就拿其他的网站来做测试,发现一次性发200个请求时没问题的,证明了我的猜测,知乎在这方面做了防护,即一次性的请求数量是有限制的。于是我不断地减少请求数量,发现在5的时候就没有丢包情况了。说明在这个程序里一次性最多只能发5个请求,虽然不多,但这也是一次小提升了。
使用Redis保存已经访问过的用户
抓取用户的过程中,发现有些用户是已经访问过的,而且他的关注者和关注了的用户都已经获取过了,虽然在数据库的层面做了重复数据的处理,但是程序还是会使用curl发请求,这样重复的发送请求就有很多重复的网络开销。还有一个就是待抓取的用户需要暂时保存在一个地方以便下一次执行,刚开始是放到数组里面,后来发现要在程序里添加多进程,在多进程编程里,子进程会共享程序代码、函数库,但是进程使用的变量与其他进程所使用的截然不同。不同进程之间的变量是分离的,不能被其他进程读取,所以是不能使用数组的。因此就想到了使用Redis缓存来保存已经处理好的用户以及待抓取的用户。这样每次执行完的时候都把用户push到一个already_request_queue队列中,把待抓取的用户即每个用户的关注者和关注了的用户列表)push到request_queue里面,然后每次执行前都从request_queue里pop一个用户,然后判断是否在already_request_queue里面,如果在,则进行下一个,否则就继续执行。
在PHP中使用redis示例:
<ol class="dp-c"><li class="alt"><span><span><?php </span></span></li><li><span> <span class="vars">$redis</span><span> = </span><span class="keyword">new</span><span> Redis(); </span></span></li><li class="alt"><span> <span class="vars">$redis</span><span>->connect(</span><span class="string">'127.0.0.1'</span><span>, </span><span class="string">'6379'</span><span>); </span></span></li><li><span> <span class="vars">$redis</span><span>->set(</span><span class="string">'tmp'</span><span>, </span><span class="string">'value'</span><span>); </span></span></li><li class="alt"><span> <span class="keyword">if</span><span> (</span><span class="vars">$redis</span><span>->exists(</span><span class="string">'tmp'</span><span>)) </span></span></li><li><span> { </span></li><li class="alt"><span> <span class="func">echo</span><span> </span><span class="vars">$redis</span><span>->get(</span><span class="string">'tmp'</span><span>) . </span><span class="string">"\n"</span><span>; </span></span></li><li><span> } </span></li></ol>
使用PHP的pcntl扩展实现多进程
改用了curl_multi函数实现多线程抓取用户信息之后,程序运行了一个晚上,最终得到的数据有10W。还不能达到自己的理想目标,于是便继续优化,后来发现php里面有一个pcntl扩展可以实现多进程编程。下面是多编程编程的示例:
<ol class="dp-j"><li class="alt"><span><span class="comment">//PHP多进程demo</span><span> </span></span></li><li><span> <span class="comment">//fork10个进程</span><span> </span></span></li><li class="alt"><span> <span class="keyword">for</span><span> ($i = </span><span class="number">0</span><span>; $i < </span><span class="number">10</span><span>; $i++) { </span></span></li><li><span> $pid = pcntl_fork(); </span></li><li class="alt"><span> <span class="keyword">if</span><span> ($pid == -</span><span class="number">1</span><span>) { </span></span></li><li><span> echo <span class="string">"Could not fork!\n"</span><span>; </span></span></li><li class="alt"><span> exit(<span class="number">1</span><span>); </span></span></li><li><span> } </span></li><li class="alt"><span> <span class="keyword">if</span><span> (!$pid) { </span></span></li><li><span> echo <span class="string">"child process $i running\n"</span><span>; </span></span></li><li class="alt"><span> <span class="comment">//子进程执行完毕之后就退出,以免继续fork出新的子进程</span><span> </span></span></li><li><span> exit($i); </span></li><li class="alt"><span> } </span></li><li><span> } </span></li><li class="alt"><span> </span></li><li><span> <span class="comment">//等待子进程执行完毕,避免出现僵尸进程</span><span> </span></span></li><li class="alt"><span> <span class="keyword">while</span><span> (pcntl_waitpid(</span><span class="number">0</span><span>, $status) != -</span><span class="number">1</span><span>) { </span></span></li><li><span> $status = pcntl_wexitstatus($status); </span></li><li class="alt"><span> echo <span class="string">"Child $status completed\n"</span><span>; </span></span></li><li><span> } </span></li></ol>
在Linux下查看系统的cpu信息
实现了多进程编程之后,就想着多开几条进程不断地抓取用户的数据,后来开了8调进程跑了一个晚上后发现只能拿到20W的数据,没有多大的提升。于是查阅资料发现,根据系统优化的CPU性能调优,程序的最大进程数不能随便给的,要根据CPU的核数和来给,最大进程数最好是cpu核数的2倍。因此需要查看cpu的信息来看看cpu的核数。在Linux下查看cpu的信息的命令:
cat /proc/cpuinfo
其中,model name表示cpu类型信息,cpu cores表示cpu核数。这里的核数是1,因为是在虚拟机下运行,分配到的cpu核数比较少,因此只能开2条进程。最终的结果是,用了一个周末就抓取了110万的用户数据。
多进程编程中Redis和MySQL连接问题
在多进程条件下,程序运行了一段时间后,发现数据不能插入到数据库,会报mysql too many connections的错误,redis也是如此。
下面这段代码会执行失败:
<ol class="dp-j"><li class="alt"><span><span><?php </span></span></li><li><span> <span class="keyword">for</span><span> ($i = </span><span class="number">0</span><span>; $i < </span><span class="number">10</span><span>; $i++) { </span></span></li><li class="alt"><span> $pid = pcntl_fork(); </span></li><li><span> <span class="keyword">if</span><span> ($pid == -</span><span class="number">1</span><span>) { </span></span></li><li class="alt"><span> echo <span class="string">"Could not fork!\n"</span><span>; </span></span></li><li><span> exit(<span class="number">1</span><span>); </span></span></li><li class="alt"><span> } </span></li><li><span> <span class="keyword">if</span><span> (!$pid) { </span></span></li><li class="alt"><span> $redis = PRedis::getInstance(); </span></li><li><span> <span class="comment">// do something </span><span> </span></span></li><li class="alt"><span> exit; </span></li><li><span> } </span></li><li class="alt"><span> } </span></li></ol>
根本原因是在各个子进程创建时,就已经继承了父进程一份完全一样的拷贝。对象可以拷贝,但是已创建的连接不能被拷贝成多个,由此产生的结果,就是各个进程都使用同一个redis连接,各干各的事,最终产生莫名其妙的冲突。
解决方法:
程序不能完全保证在fork进程之前,父进程不会创建redis连接实例。因此,要解决这个问题只能靠子进程本身了。试想一下,如果在子进程中获取的实例只与当前进程相关,那么这个问题就不存在了。于是解决方案就是稍微改造一下redis类实例化的静态方式,与当前进程ID绑定起来。
改造后的代码如下:
<ol class="dp-j"><li class="alt"><span><span><?php </span></span></li><li><span> <span class="keyword">public</span><span> </span><span class="keyword">static</span><span> function getInstance() { </span></span></li><li class="alt"><span> <span class="keyword">static</span><span> $instances = array(); </span></span></li><li><span> $key = getmypid();<span class="comment">//获取当前进程ID</span><span> </span></span></li><li class="alt"><span> <span class="keyword">if</span><span> ($empty($instances[$key])) { </span></span></li><li><span> $inctances[$key] = <span class="keyword">new</span><span> self(); </span></span></li><li class="alt"><span> } </span></li><li><span> </span></li><li class="alt"><span> <span class="keyword">return</span><span> $instances[$key]; </span></span></li><li><span> } </span></li></ol>
PHP统计脚本执行时间
因为想知道每个进程花费的时间是多少,因此写个函数统计脚本执行时间:
<ol class="dp-j"><li class="alt"><span><span>function microtime_float() </span></span></li><li><span>{ </span></li><li class="alt"><span> list($u_sec, $sec) = explode(<span class="string">' '</span><span>, microtime()); </span></span></li><li><span> <span class="keyword">return</span><span> (floatval($u_sec) + floatval($sec)); </span></span></li><li class="alt"><span>} </span></li><li><span> </span></li><li class="alt"><span>$start_time = microtime_float(); </span></li><li><span> </span></li><li class="alt"><span><span class="comment">//do something</span><span> </span></span></li><li><span>usleep(<span class="number">100</span><span>); </span></span></li><li class="alt"><span> </span></li><li><span>$end_time = microtime_float(); </span></li><li class="alt"><span>$total_time = $end_time - $start_time; </span></li><li><span> </span></li><li class="alt"><span>$time_cost = sprintf(<span class="string">"%.10f"</span><span>, $total_time); </span></span></li><li><span> </span></li><li class="alt"><span>echo <span class="string">"program cost total "</span><span> . $time_cost . </span><span class="string">"s\n"</span><span>; </span></span></li></ol>
数据分析
抓取了110万的数据后,小小做了一些数据分析,结果如下:
若文中有不正确的地方,望各位指出以便改正。


Scrapy是一个基于Python的爬虫框架,可以快速而方便地获取互联网上的相关信息。在本篇文章中,我们将通过一个Scrapy案例来详细解析如何抓取LinkedIn上的公司信息。确定目标URL首先,我们需要明确我们的目标是LinkedIn上的公司信息。因此,我们需要找到LinkedIn公司信息页面的URL。打开LinkedIn网站,在搜索框中输入公司名称,在

Instagram是目前最流行的社交媒体之一,拥有着数亿的活跃用户。其中用户上传了数十亿的图片和视频,这些数据对于许多企业和个人来说都是非常有价值的。因此,在许多情况下,需要使用程序自动抓取Instagram数据。本文将介绍如何使用PHP实现Instagram数据的抓取,并提供实现示例。安装PHP的cURL扩展cURL是一个用于在各种

知乎作为一个极受欢迎的知识分享社区,其上众多用户贡献了大量高质量的问题和回答,对于学习和工作的人们来说,这些内容对于解决问题和拓展视野非常有帮助。如果想要整理和利用这些内容,就需要使用抓取程序获取相关数据。本文将介绍使用PHP编写抓取知乎问题及回答的程序。简介知乎是一个内容非常丰富的平台,其上的内容包括但并不限于问题、回答、专栏、话题、用户等。我们可以通

Scrapy是一个用于抓取和解析网站数据的Python框架。它可以帮助开发人员轻松抓取网站数据并进行分析,从而实现数据挖掘和信息收集等任务。本文将分享如何使用Scrapy创建和执行一个简单的爬虫程序。第一步:安装和配置Scrapy在使用Scrapy之前,需要首先安装和配置Scrapy环境。可以通过运行以下命令安装Scrapy:pipinstallscra

Java爬虫实战:快速抓取网页数据的方法与技巧引言:随着互联网的发展,海量的信息被存储在网页中,人们想要从中获取有用的数据变得越来越困难。而使用爬虫技术,我们可以快速、自动地抓取网页数据,提取出我们需要的有用信息。本文将介绍使用Java进行爬虫开发的方法与技巧,并提供具体的代码示例。一、选择合适的爬虫框架在Java领域,有许多优秀的爬虫框架可供选择,如Jso

Nginx重定向配置解析,实现URL转发和抓取引言:在Web应用开发中,经常会遇到需要对URL进行重定向的情况。Nginx作为一种高性能的Web服务器和反向代理服务器,提供了强大的重定向功能。本文将对Nginx的重定向配置进行解析,并通过代码示例展示如何实现URL转发和抓取的功能。一、基本概念重定向是指将一个URL请求转发到另一个URL的过程。在Nginx中

如何运用PHP和phpSpider进行特定网站内容的精准抓取?导言:随着互联网的发展,网站上的数据量越来越多,通过手动操作获取所需信息的效率较低。因此,我们经常需要运用自动化抓取工具来获取特定网站的内容,PHP语言和phpSpider库就是其中一个非常实用的工具。本文将介绍如何使用PHP和phpSpider进行特定网站内容的精准抓取,并提供代码示例。一、安装

Nginx重定向配置教程,实现URL转发和抓取Nginx是一款高性能的开源Web服务器,也可以用来实现反向代理、负载均衡以及URL重定向等功能。在本篇文章中,我们将介绍如何通过Nginx配置实现URL重定向和抓取的功能,并且提供相关的代码示例。一、URL转发URL转发是指将一个URL请求转发到另一个URL地址上。在Nginx中,我们可以通过配置来实现URL的


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Atom editor mac version download
The most popular open source editor

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Dreamweaver Mac version
Visual web development tools

Zend Studio 13.0.1
Powerful PHP integrated development environment
