Home >Backend Development >Python Tutorial >一则python3的简单爬虫代码

一则python3的简单爬虫代码

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-06-06 11:30:411779browse

不得不说python的上手非常简单。在网上找了一下，大都是python2的帖子，于是随手写了个python3的。代码非常简单就不解释了，直接贴代码。

代码如下:

#test rdp
import urllib.request
import re

#登录用的帐户信息
data={}
data['fromUrl']=''
data['fromUrlTemp']=''
data['loginId']='12345'
data['password']='12345'
user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
#登录地址
#url='http://192.168.1.111:8080/loginCheck'
postdata = urllib.parse.urlencode(data)
postdata = postdata.encode('utf-8')
headers = { 'User-Agent' : user_agent }
#登录
res = urllib.request.urlopen(url,postdata)
#取得页面html
strResult=(res.read().decode('utf-8'))
#用正则表达式取出所有A标签
p = re.compile(r'(.*?)')
for m in p.finditer(strResult):
print (m.group(1))#group(1)是href里面的内容，group(2)是a标签里的文字

关于cookie、异常等处理看了一下，没有花时间去处理，毕竟只是想通过写爬虫来学习python。

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：从零学Python之入门（四）运算Next article：从零学python系列之新版本导入httplib模块报ImportError解决方案

See more

一则python3的简单爬虫代码

Related articles