Home  >  Article  >  Backend Development  >  Detailed explanation of C# using Selenium+PhantomJS to capture data

Detailed explanation of C# using Selenium+PhantomJS to capture data

迷茫
迷茫Original
2017-03-26 16:29:484013browse

The project at hand needs to capture data from a website rendered with js. The page captured using the commonly used httpclient has no data. After searching on Baidu, the solution recommended by everyone is to use PhantomJS. PhantomJS is a webkit browser without an interface, which can use js to render pages with the same effect as the browser. Selenium is a web testing framework. Using Selenium to operate PhantomJS is a perfect match. But most of the examples on the Internet are in Python. Helpless, I downloaded python and followed the tutorial, but got stuck on the Selenium import problem. So I gave up and decided to use my usual C#, because I didn’t believe it was not available in C#. After half an hour of fiddling, I got it done (an hour of fiddling with python). Record this blog post so that novices in c# like me can use PhantomJS.

Step one: Open visual studio 2017, create a new console project, and open the nuget package manager.

Part 2: Search for Selenium and install Selenium.WebDriver. Note: If you want to use a proxy, it is best to install version 3.0.0.

Step 3: Write the code as shown below. But an error will be reported when executing. The reason is that PhantomJS.exe cannot be found. At this time, you can download one, or you can continue to step four.

using OpenQA.Selenium;using OpenQA.Selenium.PhantomJS;using System;namespace ConsoleApp1
{    class Program
    {        static void Main(string[] args)
        {            var url = "http://www.baidu.com";
            IWebDriver driver = new PhantomJSDriver(GetPhantomJSDriverService());
            driver.Navigate().GoToUrl(url);
            Console.WriteLine(driver.PageSource);
            Console.Read();
        }        private static PhantomJSDriverService GetPhantomJSDriverService()
        {
            PhantomJSDriverService pds = PhantomJSDriverService.CreateDefaultService();            //设置代理服务器地址            //pds.Proxy = $"{ip}:{port}";  
            //设置代理服务器认证信息            //pds.ProxyAuthentication = GetProxyAuthorization();
            return pds;
        }
    }
}

Step 4: Open nuget to install the Selenium.PhantomJS.WebDriver package.

Step 5: Run. You can see that phantomjs.exe is automatically downloaded.

Okay, now you can start your data capture business.

The above is the detailed content of Detailed explanation of C# using Selenium+PhantomJS to capture data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn