Home  >  Article  >  Backend Development  >  How to access dynamic HTML elements via web scraping?

How to access dynamic HTML elements via web scraping?

王林
王林forward
2024-02-09 09:51:17340browse

如何通过网页抓取访问动态 HTML 元素?

php editor Xiaoxin here introduces a method to access dynamic HTML elements through web crawling. When we crawl web pages, we sometimes encounter dynamically generated content that cannot be obtained directly until the web page is loaded. Fortunately, there are tools and techniques we can use to solve this problem. This article will introduce a PHP-based method that can be used to easily crawl and access dynamic HTML elements. Let’s take a look!

Question content

I am using go-rod for web scraping. I want to access links within dynamic 3499910bf9dac5ae3c52d5ede7383485. To make this a visible, I have to complete a searcher which is an input with the next format (without submit):

<form>
    <input> <!--this is the searcher-->
<form/>

So, when I'm done, the a I want to access appears:

Up to here, everything is fine. This is the code I use to complete the searcher:

//page's url
page := rod.new().mustconnect().mustpage("https://www.sofascore.com/")

//acept cookies alert
page.mustelement("cookiesalertselector...").mustclick()

//completes the searcher
el := page.mustelement(`searcherselector...`)
el.mustinput("lionel messi")

Now the problem arises, when I want to click on the a that appears after completing the search.

I tried this:

diviwant := page.mustelement("aselector...")
diviwant.mustclick()

and this:

diviwant := page.mustelement("aselector...").mustwaitvisible()
diviwant.mustclick()

However, they all return me the same error:

panic: {-32000 node is detached from document }
goroutine 1 [running]:
github.com/go-rod/rod/lib/utils.glob..func2({0x100742dc0?,
0x140002bad50?})
/users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email&#160;protected]/lib/utils/utils.go:65
+0x24 github.com/go-rod/rod.gene.func1({0x14000281ca0?, 0x1003a98b7?, 0x4?})
/users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email&#160;protected]/must.go:36
+0x64 github.com/go-rod/rod.(*element).mustclick(0x14000289320)   /users/lucastomicbenitez/go/pkg/mod/github.com/go-rod/[email&#160;protected]/must.go:729
+0x9c main.main()     /users/lucastomicbenitez/development/golang/evolutionaryalgorithm/main/main.go:22
+0x9c exit status 2

So, while looking for some solutions, I found this github issue and tried to get the link via this method:

link := page.musteval(`()=> document.queryselector('aselector...').href`)

But it returns this:

panic: eval js error: TypeError: Cannot read properties of null
(reading 'href')

However, I'm pretty sure the selector is correct. What did i do wrong?

Workaround

As @hymns for disco said in the comments, I just had to wait a while after the searcher finished.

el.MustInput("Lionel Messi")

time.Sleep(time.Second)

link := page.MustEval(`()=> document.querySelector('aSelector...').href`)

The above is the detailed content of How to access dynamic HTML elements via web scraping?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete