


python scan proxy and how to get the available proxy IP example sharing
The following editor will bring you an example of python scanning proxy and obtaining the available proxy IP. The editor thinks it’s pretty good, so I’ll share it with you now and give it as a reference. Let’s follow the editor and take a look.
Today we will write a very practical tool, which is to scan and obtain available proxies
First of all , I first found a website on Baidu: www.xicidaili.com As an example
This website publishes many proxy IPs and ports available at home and abroad
We still proceed as usual. For analysis, let’s scan all domestic proxies first.
Click on the domestic part to review and find that the domestic proxy and directory are the following url:
www.xicidaili.com/nn/x
This x has almost more than 2,000 pages, so it seems that thread processing is required again. . .
As usual, we try to see if we can get the content directly with the simplest requests.get()
returns 503, then we add a simple headers
and return 200, OK
Okay, let’s first analyze the content of the web page and get the content we want
We found that the content containing IP information is within the
But we later found that the contents of ip, port, and protocol were in the 2nd, 3rd, and 6th
So we started to try to write, here is the writing idea:
When processing the page, we first extract the tr tag, and then add the tr tag to the page. Extracting the td tag in the tag
Therefore, two bs operations are used, and str processing is required when using the bs operation for the second time
Because after we obtain tr, we need 2 of them, Things No. 3 and 6,
But when we use the i output by a for loop, we cannot perform group operations
So we simply perform a second operation on the soup of each td separately Then directly extract 2,3,6
After extraction, just add .string to extract the content
r = requests.get(url = url,headers = headers) soup = bs(r.content,"html.parser") data = soup.find_all(name = 'tr',attrs = {'class':re.compile('|[^odd]')}) for i in data: soup = bs(str(i),'html.parser') data2 = soup.find_all(name = 'td') ip = str(data2[1].string) port = str(data2[2].string) types = str(data2[5].string).lower() proxy = {} proxy[types] = '%s:%s'%(ip,port)
In this way, we can get it every time in the loop Generate the corresponding proxy dictionary so that we can use the
dictionary to verify the IP availability. One thing to note here is that we have an operation to change the type to lowercase, because it is written in proxies in the get method. The protocol name should be in lowercase, and the webpage captures the content in uppercase, so a case conversion is performed
So what is the idea of verifying the availability of the IP
It is very simple, we use get, add Go to our proxy and request the website:
http://1212.ip138.com/ic.asp
This is a magical website that can return your external network IP
url = 'http://1212.ip138.com/ic.asp' r = requests.get(url = url,proxies = proxy,timeout = 6)
Here we need to add timeout to remove those agents that wait too long. I set it to 6 seconds
We use one IP Try and analyze the returned page
The returned content is as follows:
<html> <head> <meta xxxxxxxxxxxxxxxxxx> <title> 您的IP地址 </title> </head> <body style="margin:0px"><center>您的IP是:[xxx.xxx.xxx.xxx] 来自:xxxxxxxx</center></body></html>
Then we only need to extract the [] The content can be
If our proxy is available, the proxy’s IP
will be returned (the returned address here will still be our local external network IP, although I am not the same) It's very clear, but I excluded this situation. The proxy should still be unavailable)
Then we can make a judgment. If the returned ip is the same as the ip in the proxy dictionary, then the ip is considered available. Agent and write it to the file
This is our idea. Finally, we can process the queue and threading threads
The code above:
#coding=utf-8 import requests import re from bs4 import BeautifulSoup as bs import Queue import threading class proxyPick(threading.Thread): def __init__(self,queue): threading.Thread.__init__(self) self._queue = queue def run(self): while not self._queue.empty(): url = self._queue.get() proxy_spider(url) def proxy_spider(url): headers = { ....... } r = requests.get(url = url,headers = headers) soup = bs(r.content,"html.parser") data = soup.find_all(name = 'tr',attrs = {'class':re.compile('|[^odd]')}) for i in data: soup = bs(str(i),'html.parser') data2 = soup.find_all(name = 'td') ip = str(data2[1].string) port = str(data2[2].string) types = str(data2[5].string).lower() proxy = {} proxy[types] = '%s:%s'%(ip,port) try: proxy_check(proxy,ip) except Exception,e: print e pass def proxy_check(proxy,ip): url = 'http://1212.ip138.com/ic.asp' r = requests.get(url = url,proxies = proxy,timeout = 6) f = open('E:/url/ip_proxy.txt','a+') soup = bs(r.text,'html.parser') data = soup.find_all(name = 'center') for i in data: a = re.findall(r'\[(.*?)\]',i.string) if a[0] == ip: #print proxy f.write('%s'%proxy+'\n') print 'write down' f.close() #proxy_spider() def main(): queue = Queue.Queue() for i in range(1,2288): queue.put('http://www.xicidaili.com/nn/'+str(i)) threads = [] thread_count = 10 for i in range(thread_count): spider = proxyPick(queue) threads.append(spider) for i in threads: i.start() for i in threads: i.join() print "It's down,sir!" if __name__ == '__main__': main()
In this way we can write all the available proxy IPs provided on the website into the file ip_proxy.txt
The above is the detailed content of python scan proxy and how to get the available proxy IP example sharing. For more information, please follow other related articles on the PHP Chinese website!

TomergelistsinPython,youcanusethe operator,extendmethod,listcomprehension,oritertools.chain,eachwithspecificadvantages:1)The operatorissimplebutlessefficientforlargelists;2)extendismemory-efficientbutmodifiestheoriginallist;3)listcomprehensionoffersf

In Python 3, two lists can be connected through a variety of methods: 1) Use operator, which is suitable for small lists, but is inefficient for large lists; 2) Use extend method, which is suitable for large lists, with high memory efficiency, but will modify the original list; 3) Use * operator, which is suitable for merging multiple lists, without modifying the original list; 4) Use itertools.chain, which is suitable for large data sets, with high memory efficiency.

Using the join() method is the most efficient way to connect strings from lists in Python. 1) Use the join() method to be efficient and easy to read. 2) The cycle uses operators inefficiently for large lists. 3) The combination of list comprehension and join() is suitable for scenarios that require conversion. 4) The reduce() method is suitable for other types of reductions, but is inefficient for string concatenation. The complete sentence ends.

PythonexecutionistheprocessoftransformingPythoncodeintoexecutableinstructions.1)Theinterpreterreadsthecode,convertingitintobytecode,whichthePythonVirtualMachine(PVM)executes.2)TheGlobalInterpreterLock(GIL)managesthreadexecution,potentiallylimitingmul

Key features of Python include: 1. The syntax is concise and easy to understand, suitable for beginners; 2. Dynamic type system, improving development speed; 3. Rich standard library, supporting multiple tasks; 4. Strong community and ecosystem, providing extensive support; 5. Interpretation, suitable for scripting and rapid prototyping; 6. Multi-paradigm support, suitable for various programming styles.

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

WebStorm Mac version
Useful JavaScript development tools
