Home  >  Article  >  Backend Development  >  Detailed explanation of examples of common commands used to access and crawl web pages in Python

Detailed explanation of examples of common commands used to access and crawl web pages in Python

Y2J
Y2JOriginal
2017-04-25 09:22:131955browse

This article mainly introduces relevant information about common commands for python to access and crawl web pages. Friends who need it can refer to

Common commands for python to access and crawl web pages

Simple crawling of web pages:

import urllib.request  
url="http://google.cn/" 
response=urllib.request.urlopen(url)  #返回文件对象
page=response.read()

Save the URL directly as a local file:

import urllib.request  
url="http://google.cn/" 
response=urllib.request.urlopen(url)  #返回文件对象
page=response.read()

POST method:

import urllib.parse 
import urllib.request 
url="http://liuxin-blog.appspot.com/messageboard/add" 
values={"content":"命令行发出网页请求测试"} 
data=urllib.parse.urlencode(values) 

#创建请求对象 
req=urllib.request.Request(url,data) 
#获得服务器返回的数据 
response=urllib.request.urlopen(req) 
#处理数据 
page=response.read()

GET method:

import urllib.parse 
import urllib.request 
url="http://www.google.cn/webhp" 
values={"rls":"ig"} 
data=urllib.parse.urlencode(values) 
theurl=url+"?"+data 
#创建请求对象 
req=urllib.request.Request(theurl) 
#获得服务器返回的数据 
response=urllib.request.urlopen(req) 
#处理数据 
page=response.read()

There are two commonly used methods, geturl(), info()

geturl() is set to Identify whether there is a server-side URL redirection, and info() contains a series of information.

To handle Chinese problems, encode() encoding and dencode() decoding will be used:

The above is the detailed content of Detailed explanation of examples of common commands used to access and crawl web pages in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn