Home  >  Q&A  >  body text

html - Python爬虫,翻页数据怎么爬,URL不变

网址:http://quote.eastmoney.com/ce...
我想爬所有页的名称数据,(这里只有两页),判断有没有下一页的条件该怎么写呢?
代码:

from selenium import webdriver
driver=webdriver.PhantomJS()

url='http://quote.eastmoney.com/center/list.html#28003684_0_2'
driver.get(url)
usoup = BeautifulSoup(driver.page_source, 'xml')
n=[]
while True:
     t=usoup.find('table',{'id':'fixed'})
     utable=t.find_all('a',{'target':'_blank'})
     for i in range(len(utable)):
          if i % 6 ==1:
             n.append(utable[i].text)
          if #停止条件怎么写:
            break
     driver.find_element_by_xpath(r'//*@id="pagenav"]/a[2]').click()
     usoup = BeautifulSoup(driver.page_source, 'xml')

后面这里就不会写了。。。

大家讲道理大家讲道理2741 days ago804

reply all(4)I'll reply

  • 巴扎黑

    巴扎黑2017-04-18 10:33:18

    You can judge the entries on each page. There are 20 entries on each page. If the current page has less than 20 entries, it means that this page is the last page. You should stop after crawling the current page

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 10:33:18

    By the way, doesn’t this form have a jsonp return interface? Why still climb?

    reply
    0
  • PHPz

    PHPz2017-04-18 10:33:18

    It uses the jsonp interface, just take it.

    If you have to crawl it, you can only use a simulation page like selenium + phantomjs to get it.

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-18 10:33:18

    http://nufm.dfcfw.com/EM_Fina...{rank:[(x)],pages:(pc)}&token=7bc05d0d4c3c22ef9fca8c2a912d779c&jsName=quote_123&_g=0.5385195357178545

    reply
    0
  • Cancelreply