在爬蟲中取元素的值有多種方法,以下是幾種常用的方法:
import re html = "<a href='https://www.example.com'>Example</a>" links = re.findall(r"<a.*?href=['\"](.*?)['\"].*?>(.*?)</a>", html) for link in links: url = link[0] text = link[1] print("URL:", url) print("Text:", text)
from bs4 import BeautifulSoup html = "<h1>This is a title</h1>" soup = BeautifulSoup(html, 'html.parser') titles = soup.find_all('h1') for title in titles: print("Title:", title.text)
from lxml import etree html = "<p>This is a paragraph.</p>" tree = etree.HTML(html) paragraphs = tree.xpath('//p') for paragraph in paragraphs: print("Text:", paragraph.text)
這些都是常見的方法,具體使用哪一種方法取決於你所爬取的網站和資料結構的特點。
以上是怎麼在爬蟲中取元素裡的值的詳細內容。更多資訊請關注PHP中文網其他相關文章!