搜索

首页  >  问答  >  正文

python - bs爬下div后如何获得最外层标签?

大家讲道理大家讲道理2898 天前319

全部回复(2)我来回复

  • PHPz

    PHPz2017-04-18 10:05:34

        <p class="l_post l_post_bright j_l_post clearfix " data-field='{"author":{"user_id":348570172,   "user_name":"\u6446\u6446\u821e\u66f2","props":null},"content":{"post_id":31489927386,"is_anonym":false,"forum_id":874949,"thread_id":2108034524,"content":"912904081@qq.com\u8c22\u8c22\u6492","post_no":94,"type":"0","comment_num":0,"props":null,"post_index":0,"pb_tpoint":null}}'> <p class="d_author"> <ul class="p_author">
        ...
        </p>
    

    要爬取的是这个p最外层的标签里user_name和content,中间还有好多好多标签,就是把这个p里的都爬下来了,想知道怎么就留最外面我需要的这个

    回复
    0
  • 天蓬老师

    天蓬老师2017-04-18 10:05:34

       r = requests.get("http://tieba.baidu.com/p/2108034524?pn=4")
       soup = BeautifulSoup(r.content, "lxml")
       users = soup.find_all("p", class_="l_post")
       for user in users:
           print(user["data-field"])
           # 其他处理
    

    然后对取出的内容再进行处理

    回复
    0
  • 取消回复