Home  >  Article  >  Backend Development  >  Python method to extract hyperlinks from web pages

Python method to extract hyperlinks from web pages

高洛峰
高洛峰Original
2017-02-22 16:52:183208browse

Many people initially learn Python and plan to use it for crawler development. Since you want to do a crawler, you must first crawl the web page and extract the hyperlink address from the web page. This article will share with you a simple method, which you can refer to if necessary.

The following is the simplest implementation method. First, capture the target web page, and then obtain the hyperlink by regularly matching the href attribute in the a tag.

The code is as follows:

import urllib2
import re
 
url = 'http://www.sunbloger.com/'
 
req = urllib2.Request(url)
con = urllib2.urlopen(req)
doc = con.read()
con.close()
 
links = re.findall(r'href\=\"(http\:\/\/[a-zA-Z0-9\.\/]+)\"', doc)
for a in links:
  print a


For more related articles on how Python extracts hyperlinks from web pages, please pay attention to the PHP Chinese website!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn