Home  >  Article  >  Backend Development  >  Detailed explanation of html analysis method using python's BeautifulSoup

Detailed explanation of html analysis method using python's BeautifulSoup

高洛峰
高洛峰Original
2017-03-31 11:36:531559browse

1) Searchtag:

find(tagname) # Directly search for the tag named tagname, such as: find('head')
find (list)                # Search for tags in list, such as: find(['head', 'body'])
find(dict)                                                {'head':True, 'body':True})
find(re.compile('')) # Search for tags that conform to regular rules, such as: find(re.compile('^p')) Search for Tags starting with p
find(lambda) # Search Function Returns a tag whose result is true, such as: find(lambda name: if len(name) == 1) Search Tag with length 1
find(True) # Search all tags

2) Search text (text)

3) recursive, limit:

from bs4 import BeautifulSoup
import re
 
doc = ['<html><head><title>Page title</title></head>',
       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
       '</html>']
soup = BeautifulSoup(''.join(doc))
 
print soup.prettify()+"\n"
print soup.findAll('b')
 
print soup.findAll(text=re.compile("paragraph"))
print soup.findAll(text=True)
print soup.findAll(text=lambda(x):len(x)<12)
 
a = soup.findAll(re.compile('^b'))
print [tag.name for tag in a]
 
print [tag.name for tag in soup.html.findAll()]
print [tag.name for tag in soup.html.findAll(recursive=False)]
 
print soup.findAll('p',limit=1)

The above is the detailed content of Detailed explanation of html analysis method using python's BeautifulSoup. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn