Example of Python crawler grabbing proxy IP and checking availability-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Example of Python crawler grabbing proxy IP and checking availability

不言

May 07, 2018 pm 12:00 PM

pythonactingtest

This article mainly introduces examples of Python crawlers grabbing proxy IPs and checking availability. It has certain reference value. Now I share it with you. Friends in need can refer to it.

Write crawlers often. It is inevitable that the IP will be blocked by the target website. One IP is definitely not enough. As a frugal programmer, if you can do it without spending money, then go find it yourself. This time I wrote about crawling. The IP on the West Spur proxy, but this website is also crawled! ! !

As for how to deal with it, I think you can try increasing the delay. Maybe I crawled too frequently, so my IP was blocked.

However, you can still try the IP bus. All roads lead to Rome, and you can’t hang yourself from a tree.

No nonsense, just code.

#!/usr/bin/env python
# -*- coding:utf8 -*-
import urllib2
import time
from bs4 import BeautifulSoup
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
req_header = {&#39;User-Agent&#39;:&#39;Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11&#39;,
 &#39;Accept&#39;:&#39;text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8&#39;,
 #&#39;Accept-Language&#39;: &#39;en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3&#39;,
 &#39;Accept-Charset&#39;:&#39;ISO-8859-1,utf-8;q=0.7,*;q=0.3&#39;,
 &#39;Accept-Encoding&#39;:&#39;en-us&#39;,
 &#39;Connection&#39;:&#39;keep-alive&#39;,
 &#39;Referer&#39;:&#39;http://www.baidu.com/&#39;
 }
req_timeout = 5
testUrl = "http://www.baidu.com/"
testStr = "wahaha"
file1 = open(&#39;proxy.txt&#39; , &#39;w&#39;)
# url = ""
# req = urllib2.Request(url,None,req_header)
# jsondatas = urllib2.urlopen(req,None,req_timeout).read()
cookies = urllib2.HTTPCookieProcessor()
checked_num = 0
grasp_num = 0
for page in range(1, 160):
 req = urllib2.Request(&#39;http://www.xici.net.co/nn/&#39; + str(page), None, req_header)
 html_doc = urllib2.urlopen(req, None, req_timeout).read()
 # html_doc = urllib2.urlopen(&#39;http://www.xici.net.co/nn/&#39; + str(page)).read()
 soup = BeautifulSoup(html_doc)
 trs = soup.find(&#39;table&#39;, id=&#39;ip_list&#39;).find_all(&#39;tr&#39;)
 for tr in trs[1:]:
  tds = tr.find_all(&#39;td&#39;)
  ip = tds[1].text.strip()
  port = tds[2].text.strip()
  protocol = tds[5].text.strip()
  if protocol == &#39;HTTP&#39; or protocol == &#39;HTTPS&#39;:
   #of.write(&#39;%s=%s:%s\n&#39; % (protocol, ip, port))
   print &#39;%s=%s:%s&#39; % (protocol, ip, port)
   grasp_num +=1
   proxyHandler = urllib2.ProxyHandler({"http": r&#39;http://%s:%s&#39; % (ip, port)})
   opener = urllib2.build_opener(cookies, proxyHandler)
   opener.addheaders = [(&#39;User-Agent&#39;,
         &#39;Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36&#39;)]
   t1 = time.time()
   try:
    req = opener.open(testUrl, timeout=req_timeout)
    result = req.read()
    timeused = time.time() - t1
    pos = result.find(testStr)
    if pos > 1:
     file1.write(protocol+"\t"+ip+"\t"+port+"\n")
     checked_num+=1
     print checked_num, grasp_num
    else:
     continue
   except Exception,e:
    continue
file1.close()
print checked_num,grasp_num

Personally, I feel that the code is not too complicated, so I didn’t add comments. I believe everyone can basically understand it. If so, Please also criticize and correct any problems and make progress together!

Related recommendations:

Python method to collect proxy IP and determine whether it is available and update it regularly

The above is the detailed content of Example of Python crawler grabbing proxy IP and checking availability. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Python: compiler or Interpreter?May 13, 2025 am 12:10 AM

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Python For Loop vs While Loop: When to Use Which?May 13, 2025 am 12:07 AM

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Python loops: The most common errorsMay 13, 2025 am 12:07 AM

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i

For loop and while loop in Python: What are the advantages of each?May 13, 2025 am 12:01 AM

Forloopsareadvantageousforknowniterationsandsequences,offeringsimplicityandreadability;whileloopsareidealfordynamicconditionsandunknowniterations,providingcontrolovertermination.1)Forloopsareperfectforiteratingoverlists,tuples,orstrings,directlyacces

Python: A Deep Dive into Compilation and InterpretationMay 12, 2025 am 12:14 AM

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Is Python an interpreted or a compiled language, and why does it matter?May 12, 2025 am 12:09 AM

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

For Loop vs While Loop in Python: Key Differences ExplainedMay 12, 2025 am 12:08 AM

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

For and While loops: a practical guideMay 12, 2025 am 12:07 AM

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond

See all articles