当python爬虫遇到10060错误_html/css_WEB-ITnose-HTML Tutorial-php.cn

Home

Web Front-end

HTML Tutorial

当python爬虫遇到10060错误_html/css_WEB-ITnose

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 21, 2016 am 08:50 AM

相信做过网站爬虫工作的同学都知道，python的urllib2用起来很方便，使用以下几行代码就可以轻松拿到某个网站的源码：

#coding=utf-8import urllibimport urllib2import reurl = "http://wetest.qq.com"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()print html

最后通过一定的正则匹配，解析返回的响应内容即可拿到你想要的东东。

但这样的方式在办公网和开发网下，处理部分外网站点时则会行不通。

比如： http://tieba.baidu.com/p/2460150866 ，执行时一直报10060的错误码，提示连接失败。

#coding=utf-8import urllibimport urllib2import reurl = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()print html

执行后，错误提示截图如下：

为了分析这一问题的原因，撸主采用了如下过程：

1、在浏览器里输入，可以正常打开，说明该站点是可以访问的。

2、同样的脚本放在公司的体验网上运行OK，说明脚本本身没有问题。

通过以上两个步骤，初步判断是公司对于外网的访问策略限制导致的。于是查找了下如何给urllib2设置ProxyHandler代理，将代码修改为如下：

#coding=utf-8import urllibimport urllib2import re# The proxy address and port:proxy_info = { 'host' : 'web-proxy.oa.com','port' : 8080 }# We create a handler for the proxyproxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})# We create an opener which uses this handler:opener = urllib2.build_opener(proxy_support)# Then we install this opener as the default opener for urllib2:urllib2.install_opener(opener)url = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()print html

再次运行，可以拿到所要的Html页面了。到这里就完了么？没有啊！撸主想拿到贴吧里的各种美图，保存在本地，上代码吧：

#coding=utf-8import urllibimport urllib2import re# The proxy address and port:proxy_info = { 'host' : 'web-proxy.oa.com','port' : 8080 }# We create a handler for the proxyproxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})# We create an opener which uses this handler:opener = urllib2.build_opener(proxy_support)# Then we install this opener as the default opener for urllib2:urllib2.install_opener(opener)url = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()#正则匹配reg = r'src="(.+?\.jpg)" pic_ext'imgre = re.compile(reg)imglist = re.findall(imgre,html)print 'start dowload pic'x = 0for imgurl in imglist:urllib.urlretrieve(imgurl,'pic\\%s.jpg' % x)x = x+1

再次运行，发现还是有报错！尼玛！又是10060报错，我设置了urllib2的代理了啊，为啥还是报错！

于是撸主继续想办法，一定要想拿到贴吧里的各种美图。既然通过正则匹配可以拿到贴吧里的图片的url，为何不手动去调用urllib2.urlopen去打开对应的url，获得对应的response，然后read出对应的图片二进制数据，然后保存图片到本地文件。于是有了下面的代码：

#coding=utf-8import urllibimport urllib2import re# The proxy address and port:proxy_info = { 'host' : 'web-proxy.oa.com','port' : 8080 }# We create a handler for the proxyproxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})# We create an opener which uses this handler:opener = urllib2.build_opener(proxy_support)# Then we install this opener as the default opener for urllib2:urllib2.install_opener(opener)url = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()#正则匹配reg = r'src="(.+?\.jpg)" pic_ext'imgre = re.compile(reg)imglist = re.findall(imgre,html)x = 0print 'start'for imgurl in imglist:print imgurlresp = urllib2.urlopen(imgurl)respHtml = resp.read()picFile = open('%s.jpg' % x, "wb")picFile.write(respHtml)picFile.close()x = x+1print 'done'

再次运行，发现图片的url按预期的打印出来，并且图片也被保存下来了：

至此，已完成撸主原先要做的目的。哈哈，希望总结的东东对其他小伙伴也有用。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What is the difference between an HTML tag and an HTML attribute?May 14, 2025 am 12:01 AM

HTMLtagsdefinethestructureofawebpage,whileattributesaddfunctionalityanddetails.1)Tagslike,,andoutlinethecontent'splacement.2)Attributessuchassrc,class,andstyleenhancetagsbyspecifyingimagesources,styling,andmore,improvingfunctionalityandappearance.

The Future of HTML: Evolution and TrendsMay 13, 2025 am 12:01 AM

The future of HTML will develop in a more semantic, functional and modular direction. 1) Semanticization will make the tag describe the content more clearly, improving SEO and barrier-free access. 2) Functionalization will introduce new elements and attributes to meet user needs. 3) Modularity will support component development and improve code reusability.

Why are HTML attributes important for web development?May 12, 2025 am 12:01 AM

HTMLattributesarecrucialinwebdevelopmentforcontrollingbehavior,appearance,andfunctionality.Theyenhanceinteractivity,accessibility,andSEO.Forexample,thesrcattributeintagsimpactsSEO,whileonclickintagsaddsinteractivity.Touseattributeseffectively:1)Usese

What is the purpose of the alt attribute? Why is it important?May 11, 2025 am 12:01 AM

The alt attribute is an important part of the tag in HTML and is used to provide alternative text for images. 1. When the image cannot be loaded, the text in the alt attribute will be displayed to improve the user experience. 2. Screen readers use the alt attribute to help visually impaired users understand the content of the picture. 3. Search engines index text in the alt attribute to improve the SEO ranking of web pages.

HTML, CSS, and JavaScript: Examples and Practical ApplicationsMay 09, 2025 am 12:01 AM

The roles of HTML, CSS and JavaScript in web development are: 1. HTML is used to build web page structure; 2. CSS is used to beautify the appearance of web pages; 3. JavaScript is used to achieve dynamic interaction. Through tags, styles and scripts, these three together build the core functions of modern web pages.

How do you set the lang attribute on the tag? Why is this important?May 08, 2025 am 12:03 AM

Setting the lang attributes of a tag is a key step in optimizing web accessibility and SEO. 1) Set the lang attribute in the tag, such as. 2) In multilingual content, set lang attributes for different language parts, such as. 3) Use language codes that comply with ISO639-1 standards, such as "en", "fr", "zh", etc. Correctly setting the lang attribute can improve the accessibility of web pages and search engine rankings.

What is the purpose of HTML attributes?May 07, 2025 am 12:01 AM

HTMLattributesareessentialforenhancingwebelements'functionalityandappearance.Theyaddinformationtodefinebehavior,appearance,andinteraction,makingwebsitesinteractive,responsive,andvisuallyappealing.Attributeslikesrc,href,class,type,anddisabledtransform

How do you create a list in HTML?May 06, 2025 am 12:01 AM

TocreatealistinHTML,useforunorderedlistsandfororderedlists:1)Forunorderedlists,wrapitemsinanduseforeachitem,renderingasabulletedlist.2)Fororderedlists,useandfornumberedlists,customizablewiththetypeattributefordifferentnumberingstyles.

See all articles