Home  >  Q&A  >  body text

网页爬虫 - Python+Selenium+PhantomJs爬虫,如何取得新打开页面的源码?

我在做一个python爬虫,使用了selenium库和phantomjs浏览器。我在一个网页中触发了一个click事件打开了一个新的网页,然后我用browser.page_source得到的却是原来那个网页非新打开网页的源码,请问我该如何取得新打开页面的源码呢?

高洛峰高洛峰2740 days ago768

reply all(2)I'll reply

  • 黄舟

    黄舟2017-04-18 10:23:55

    If the link opens a new tab, your driver will still use the current window by default,

    Alternatively, you can pass a “window handle” to the “switch_to_window()” method. Knowing this, it’s possible to iterate over every open window like so:

    for handle in driver.window_handles:
        driver.switch_to_window(handle)

    For example, if your browser has several tabs, then window_handles saves the instance objects corresponding to these tabs, so if you only have one web page open currently, then the newly opened page is window_handles[1 ]
    After switching to that page, get the source code.

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-18 10:23:55

    If it is opened in the current window, it is possible that the new page has not been loaded yet and the url and data of the new page cannot be obtained by then. You can use wait here and set some conditions to ensure that the new page is loaded before proceeding. Code As follows:

    from selenium.webdriver.support.ui import WebDriverWait
    # 等待新页面生成
    WebDriverWait(self.browser, 5).until(
        expected_conditions.presence_of_element_located((By.ID, "username")
        )

    reply
    0
  • Cancelreply