search

Home  >  Q&A  >  body text

网页爬虫 - ubuntu 下 python 使用 selenium + PhantomJS 时出错

PHP中文网PHP中文网2803 days ago890

reply all(4)I'll reply

  • 迷茫

    迷茫2017-04-17 14:35:09

    I also encountered this recently. I think the dynamic js has not been parsed yet, so the web page code cannot be obtained. The exception is NoSuchElementException, which is obvious.

    reply
    0
  • PHP中文网

    PHP中文网2017-04-17 14:35:09

    There is another possibility. Because phantomjs belongs to a headless browser and has no window, all elements may not be drawn. So any element you find at this time will be a NoSuchElementException exception.
    You can try the following steps:

    browser = webdriver.PhantomJS()
    browser.set_window_size(800, 600) # set browser size.
    browser.get("http\:example.com") # Load page

    Reference: https://github.com/ariya/phantomjs/issues/11637

    reply
    0
  • 怪我咯

    怪我咯2017-04-17 14:35:09

    Answer it yourself.
    Found a solution on stackoverflow.
    Block out css, images and js to improve speed.
    Although PhantomJS still cannot be used, it is indeed faster and the purpose is achieved.

    firefox_profile = webdriver.FirefoxProfile()
    firefox_profile.set_preference("browser.download.folderList", 2)
    firefox_profile.set_preference("permissions.default.stylesheet", 2)
    firefox_profile.set_preference("permissions.default.image", 2)
    firefox_profile.set_preference("javascript.enable", False)
    
    browser = webdriver.Firefox(firefox_profile=firefox_profile)
    

    http://stackoverflow.com/questions/20892768/how-to-speed-up-browsing-in-selenium-firefox
    http://stackoverflow.com/questions/17462884/is-selenium-slow -or-is-my-code-wrong

    reply
    0
  • 阿神

    阿神2017-04-17 14:35:09

    In this case, wouldn’t js also be unable to be parsed? Why not use other faster tools?

    reply
    0
  • Cancelreply