Home >Backend Development >PHP Tutorial >PHP collection tool Snoopy trial experience_PHP tutorial

PHP collection tool Snoopy trial experience_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:27:341102browse

What is Snoopy? (Download snoopy)
Snoopy is a php class that is used to imitate the functions of a web browser. It can complete the tasks of obtaining web page content and sending forms.
Some features of Snoopy:
* Convenient to crawl the content of web pages
* Convenient to crawl the text content of web pages (removing HTML tags)
* Convenient to crawl links of web pages
* Support proxy Host
* Supports basic username/password authentication
* Supports setting user_agent, referer (source), cookies and header content (header file)
* Supports browser redirection and can control the redirection depth
* Can expand links in web pages into high-quality URLs (default)
* Convenient to submit data and obtain return values ​​
* Support tracking HTML framework (added in v0.92)
* Support redirection When passing cookies (added in v0.92)
If you want to know more deeply, Google it yourself. Here are a few simple examples:
1 Get the content of the specified url
PHP code

Copy the code The code is as follows:

$url = "http://www.jb51.net";
include("snoopy.php");
$snoopy = new Snoopy;
$snoopy->fetch( $url); //Get all content
echo $snoopy->results; //Display results
$snoopy->fetchtext //Get text content (remove html code)
$snoopy-> ;fetchlinks //Get links
$snoopy->fetchform //Get form

2 form submission
PHP code
Copy code The code is as follows:

$formvars["username"] = "admin";
$formvars["pwd"] = "admin";
$action = "http://www.jb51.net";//Form submission address
$snoopy->submit($action,$formvars);//$formvars is the submitted array
echo $snoopy- >results; //Get the results returned after form submission
$snoopy->submittext; //Only return the text without HTML after submission
$snoopy->submitlinks;//Only return after submission Link

Now that the form has been submitted, you can do a lot of things. Next, let’s disguise the IP and browser
3. Disguise
PHP code
Copy code The code is as follows:

$formvars["username"] = "admin";
$formvars["pwd"] = "admin";
$action = "http://www.jb51.net";
include "snoopy.php";
$snoopy = new Snoopy;
$snoopy->cookies["PHPSESSID" ] = 'fc106b1918bd522cc863f36890e6fff7'; //Disguise sessionid
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"; //Disguise browser
$snoopy-> ;referer = "http://s.jb51.net"; //Disguise source page address http_referer
$snoopy->rawheaders["Pragma"] = "no-cache"; //cache's http header information
$snoopy->rawheaders["X_FORWARDED_FOR"] = "127.0.0.101"; //Disguise ip
$snoopy->submit($action,$formvars);
echo $snoopy-> ;results;

It turns out that we can camouflage session, camouflage browser, camouflage IP, haha, we can do a lot of things.
For example, if you vote with a verification code and IP address, you can vote continuously.
ps: Disguising the IP here is actually disguising the http header, so the IP obtained through REMOTE_ADDR cannot be disguised.
On the contrary, those who obtain the IP through the http header (the kind that can prevent proxying) can do it themselves to create ip.
A brief explanation of how to verify the code:
First use an ordinary browser to view the page and find the sessionid corresponding to the verification code.
Write down the sessionid and verification code values ​​at the same time.
Next Just use snoopy to fake it.
Principle: Since it is the same sessionid, the verification code obtained is the same as the one entered for the first time.
4 Sometimes we may need to forge more things, snoopy completely thought of it for us
PHP code
Copy code The code is as follows:

$snoopy->proxy_host = "www.jb51.net";
$snoopy->proxy_port = "8080"; //Use proxy
$snoopy->maxredirs = 2; //Number of redirections
$snoopy->expandlinks = true; //Whether the completion link is often used during collection
// For example, the link is /images/taoav.gif, which can be changed to it The full link http://www.jb51.net/images/taoav.gif, this place can actually be replaced by the ereg_replace function during the final output
$snoopy->maxframes = 5 //Maximum frames allowed Number
//Note that when grabbing the frame, $snoopy->results returns an array
$snoopy->error //Returns error message

Basic usage above Got it, let me demonstrate it with an example:
PHP code
Copy code The code is as follows:

//echo var_dump($_SERVER);
include("Snoopy.class.php");
$snoopy = new Snoopy;
$snoopy- >agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-
CN; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 FirePHP/0.2.1";//This is Browser information
information, which browser you used to view cookies earlier, use that browser’s information (ps:$_SERVER can view the browser information)
$snoopy->referer = "http: //bbs.jb51.net/index.php";
$snoopy->expandlinks = true;
$snoopy->rawheaders["COOKIE"]="__utmz=17229162.1227682761.29.7.utmccn=( referral)|utmcsr=jb51.net|utmcct=/html/index.html|utmcmd=referral; cdbphpchina_smile=1D2D0D1; cdbphpchina_cookietime=2592000; __utma=233700831.1562900865.1227113506.1229613449.12 31233266.16; __utmz=233700831.1231233266.16.8.utmccn=(referral)| utmcsr=localhost:8080|utmcct=/test3.php|utmcmd=referral; __utma=17229162.1877703507.1227113568.1231228465.1231233160.58; uchome_loginuser=sinopf; xscdb_cookietime=259200 0; __utmc=17229162; __utmb=17229162; cdbphpchina_sid=EX5w1V; __utmc=233700831; cdbphpchina_visitedfid =17; cdbphpchinaO766uPYGK6OWZaYlvHSuzJIP22VpwEMGnPQAuWCFL9Fd6CHp2e%2FKw0x4bKz0N9lGk; ZrVKgqPOttHVr%2B6KLPg3DtWpTMUI4ttqNNVpukUj6ElM; cdbphpchina_onlineusernum=3721";
$snoopy->fetch("http://bbs.jb51.net");
$n=ereg_replace("href="","href="http://bbs.jb51.net/",$snoopy->results );
echo ereg_replace("src="","src= "http://bbs.jb51.net/",$n);
?>

This is the process of simulating logging into the PHPCHINA forum. You must first check your browser's information
Message: echo var_dump($_SERVER); This code can see the information of your browser. Copy the content after
$_SERVER['HTTP_USER_AGENT'] and paste it in the $snoopy->agent area. , and then you need to check your own
COOKIE. After logging in to the forum with your own forum account, enter
javascript:document.write(document.cookie) in the browser address bar, press Enter, and you can view it. Go to your own cookie information, copy and paste
after $snoopy->rawheaders["COOKIE"]=. (My cookie information has been deleted for security reasons)

Then pay attention to:

# $n=ereg_replace("href="","href="http:// bbs.jb51.net/",$snoopy->results );

# echo ereg_replace("src="","src="http://bbs.jb51.net/",$n );

These two lines of code, because all the HTML source addresses of the collected content are relative links, should be replaced with absolute links, so that the pictures and css styles of the forum can be quoted.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/323716.htmlTechArticleWhat is Snoopy? (Download snoopy) Snoopy is a php class that is used to imitate the functions of a web browser. It Able to complete tasks of obtaining web page content and sending forms. Some features of Snoopy: * Square...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn