Home  >  Q&A  >  body text

Download Bitcoin prices using Html Agility Pack written in C#

<p>I need to get Bitcoin price from https://coinmarketcap.com/currencies/bitcoin/ using Html Agility Pack. I'm using this example and it works fine: </p> <pre class="brush:php;toolbar:false;">var html = @"http://html-agility-pack.net/"; HtmlWeb web = new HtmlWeb(); var htmlDoc = web.Load(html); var node = htmlDoc.DocumentNode.SelectSingleNode("//head/title"); Console.WriteLine("Node Name: " node.Name "\n" node.OuterHtml);</pre> <p>XPath is: <code>//*[@id="__next"]/div/div[1]/div[2]/div/div[1]/div[2]/div/ div[2]/div[1]/div</code></p> <p>HTML code: </p> <pre class="brush:php;toolbar:false;"><div class="priceValue "><span>$17,162.42</span></div></pre> <p>I tried the following code but it returns "Object reference not set to an instance of an object": </p> <pre class="brush:php;toolbar:false;">var html = @"https://coinmarketcap.com/currencies/bitcoin/"; HtmlWeb web = new HtmlWeb(); var htmlDoc = web.Load(html); var node = htmlDoc.DocumentNode.SelectSingleNode("//div[@class='priceValue']/span"); Console.WriteLine("Node Name: " node.Name "\n" node.InnerText);`</pre></p>
P粉156532706P粉156532706410 days ago643

reply all(1)I'll reply

  • P粉729518806

    P粉7295188062023-09-06 09:17:49

    TLDR:

    1. You need to tell HtmlWeb to decompress the response (or use a suitable HTTP client)
    2. You need to fix the XPath selector

    Apparently, the SelectSingleNode() call returns null because it cannot find the node.

    In this case, it is helpful to inspect the loaded HTML. You can do this by getting the value of htmlDoc.DocumentNode.InnerHtml. I've tried doing this and the "HTML" generated is meaningless.

    The reason is that HtmlWeb does not decompress the response it receives by default. See this github issue for details. If you used a proper HTTP client (like this), or if the HtmlAgilityPack developers were more proactive, I don't think you would run into this problem.

    If you insist on using HtmlWeb, your code should look like this:

    const string html = @"https://coinmarketcap.com/currencies/bitcoin/";
            
    var web = new HtmlWeb
    {
        AutomaticDecompression = DecompressionMethods.GZip
    };
    HtmlDocument doc = web.Load(html);
    
    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='priceValue ']/span");

    Please note that the class of the element you are looking for is actually priceValue (with a space character at the end), there is another on the page with class priceValue div. That's another question, though, and you should eventually be able to find a more robust selector. Maybe try this:

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[contains(@class, 'priceSection')]//div[contains(@class, 'priceValue')]/span");

    reply
    0
  • Cancelreply