Home  >  Q&A  >  body text

How to implement interfaceless crawling using python + selenium + chromedriver

In the process of using selenium to crawl 12306, I found that phantomjs cannot be used to crawl, and chromedriver can be used. It should be that phantomjs is detected and banned by the website. Using chromedriver will display the interface again, and the crawling efficiency is low.
Now I have two questions. I have been searching on Google for a long time and have not found an effective solution.
1. How to disguise phantomjs as much as possible
2. How to set up chromedriver so that it does not display the interface, or still Are there any other ways to improve crawling efficiency

grateful! ! !

迷茫迷茫2711 days ago846

reply all(2)I'll reply

  • PHP中文网

    PHP中文网2017-05-18 10:55:13

    You can achieve your needs through PyVirtualDisplay. The code is probably like this:

    #!/usr/bin/env python
    
    from pyvirtualdisplay import Display
    from selenium import webdriver
    
    display = Display(visible=0, size=(800, 600))
    display.start()
    
    # now Firefox will run in a virtual display. 
    # you will not see the browser.
    browser = webdriver.Chrome()
    browser.get('http://www.baidu.com')
    print browser.title
    browser.quit()
    
    display.stop()
    

    I don’t know if you have modified the header information of phantomjs, you can pass

    from selenium import webdriver
    options = webdriver.ChromeOptions()
    options.add_argument('lang=zh_CN.UTF-8')
    options.add_argument('user-agent="Mozilla/5.0 (iPod; U; CPU iPhone OS 2_1 like Mac OS X; ja-jp) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1 Mobile/5F137 Safari/525.20"')
    browser = webdriver.Chrome(chrome_options=options)
    url = "https://baidu.com"
    browser.get(url)
    browser.quit()
    

    This method modifies the header information of phantomjs. You can also try this

    reply
    0
  • 世界只因有你

    世界只因有你2017-05-18 10:55:13

    You can refer to my article to run selenium in headless mode

    reply
    0
  • Cancelreply