I used p6ython3.6 to crawl down some data, but what was finally displayed was a list containing span tags. When I used get_text, contents, etc., an error would be reported. Why is this?
The initial results returned are as follows:
My code is as follows:
import requests
from bs4 import BeautifulSoup
import re
# def url_list():
# for number in range(1,21):
# url_links=[]
# url="X".format(i=number)
# url_links.append(url)
h={"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36"}
for data in soup.find("p",{"class":"list-main-eventset-finan"}).find_all("li"):
content=data.find("i",{"class":"cell date"}).find_all("span")
仅有的幸福2017-05-18 10:57:53
I don’t remember the API of bs very clearly. There should be a function that can directly obtain the text. It should be get_text()
. Then I need to traverse the returned result again, that’s it
rs = list()
for data in soup.find("p",{"class":"list-main-eventset-finan"}).find_all("li"):
contents=data.find("i",{"class":"cell date"}).find_all("span")
for content in contents:
In addition, you can also use regular expressions to match directly <span>(.*?)<
this pattern. But you have to traverse the contens list as above.
ringa_lee2017-05-18 10:57:53
Regular expressions or split+SUBSTRING can also be used, use them flexibly