Home > Article > Backend Development > Detailed explanation of html analysis method using python's BeautifulSoup
1) Searchtag:
find(tagname) # Directly search for the tag named tagname, such as: find('head')
find (list) # Search for tags in list, such as: find(['head', 'body'])
find(dict) {'head':True, 'body':True})
find(re.compile('')) # Search for tags that conform to regular rules, such as: find(re.compile('^p')) Search for Tags starting with p
find(lambda) # Search Function Returns a tag whose result is true, such as: find(lambda name: if len(name) == 1) Search Tag with length 1
find(True) # Search all tags
2) Search text (text)
3) recursive, limit:
from bs4 import BeautifulSoup import re doc = ['<html><head><title>Page title</title></head>', '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.', '<p id="secondpara" align="blah">This is paragraph <b>two</b>.', '</html>'] soup = BeautifulSoup(''.join(doc)) print soup.prettify()+"\n" print soup.findAll('b') print soup.findAll(text=re.compile("paragraph")) print soup.findAll(text=True) print soup.findAll(text=lambda(x):len(x)<12) a = soup.findAll(re.compile('^b')) print [tag.name for tag in a] print [tag.name for tag in soup.html.findAll()] print [tag.name for tag in soup.html.findAll(recursive=False)] print soup.findAll('p',limit=1)
The above is the detailed content of Detailed explanation of html analysis method using python's BeautifulSoup. For more information, please follow other related articles on the PHP Chinese website!