我想爬取每个电影的制片国家地区,但是它上面并不在一个标签里面应该怎么办
我用的是request和BeautifulSoup
res2=requests.get(h2)
res2.encoding='utf-8'
soup2=BeautifulSoup(res2.text)
这部分是已经获取该网页
ringa_lee2017-04-18 10:21:06
參考以下程式碼:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
import requests
from bs4 import BeautifulSoup
result = requests.get('https://movie.douban.com/subject/3541415/')
result.encoding = 'utf-8'
soup = BeautifulSoup(result.text, 'html.parser')
try:
info = soup.select('#info')[0]
print re.findall(ur'(?<=制片国家/地区: ).+?(?=\n)', info.text)[0]
except Exception, e:
print e
怪我咯2017-04-18 10:21:06
1、可以用正規
2、建議用soup.find_all
看文檔
https://www.crummy.com/softwa...
soup.find_all("title")
# [<title>The Dormouse's story</title>]
soup.find_all("p", "title")
# [<p class="title"><b>The Dormouse's story</b></p>]
soup.find_all("a")
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find_all(id="link2")
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
黃哥Python 回答