Home  >  Article  >  Backend Development  >  Colly - How to get the value of a child property?

Colly - How to get the value of a child property?

WBOY
WBOYforward
2024-02-11 09:36:08661browse

Colly - 如何获取子属性的值?

php editor Xigua introduces you to Colly, a powerful web crawler framework. Colly is a simple and flexible crawler framework written in Go language. It provides rich functions, including obtaining HTML elements, extracting data, and processing requests and responses. When using Colly, sometimes we need to get the value of a sub-attribute of an HTML element, such as getting the href attribute of a link. So, how to get the value of sub-property in Colly? Next, we will answer your questions one by one.

Question content

This is a sample page I have been working on https://www.lazada.vn/-i1701980654-s7563711492.html

This is the element I want to get (product title)

...
<div>
   <img src="https://lzd-img-global.slatic.net/g/tps/imgextra/i1/o1cn01juoyif22n3uu7jx4r_!!6000000007107-2-tps-162-48.png" class="pdp-mod-product-badge" alt="lazmall">
    <h1 class="pdp-mod-product-badge-title">
     yierku 【free shipping miễn phí vận chuyển】giày nam mùa thu và mùa đông giày thường xu hướng nam thể thao tất cả các trận đấu giày da tăng chiều cao giày nam
    </h1>
</div>
...

I want to get the text value between 4a249f0d628e2318394fd9b75b4636b1 elements, that is yierku [Free shipping miễn phí vận chuyển] giày n....

Here's what I've tried so far

c := colly.NewCollector()
    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Something went wrong:", err)
    })
    c.OnXML("/html/body", func(e *colly.XMLElement) {
        child := e.ChildAttrs("div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1", "class")
        fmt.Println(child)
        //fmt.Println(child)
    })

It gives a response of pdp-mod-product-badge-title

When I try to change it to

child := e.childattrs("div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1", "text" )

It doesn't give me any results

Workaround

Use func (*xmlelement) childtextinstead.

package main

import (
    "fmt"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector()
    c.OnError(func(_ *colly.Response, err error) {
        fmt.Println("Something went wrong:", err)
    })
    c.OnXML("/html/body", func(e *colly.XMLElement) {
        child := e.ChildText("div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1")
        fmt.Println(child)
    })
    c.Visit("https://www.lazada.vn/-i1701980654-s7563711492.html")
    // Output:
    // Yierku 【Free Shipping Miễn phí vận chuyển】Giày nam mùa thu và mùa đông giày thường xu hướng nam thể thao tất cả các trận đấu giày da tăng chiều cao giày nam
}

The above is the detailed content of Colly - How to get the value of a child property?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete