Home  >  Q&A  >  body text

Extract javascript generated content for a specific page

I want to extract the contents of the following partial containers:

<section class="tiw-line-name " id="EU-group-holiday-line-0" data-side="both">
<a href="/event=479/darkmoon-faire"><img src="https://wow.zamimg.com/images/wow/icons/tiny/calendar_darkmoonfaireelwynnstart.gif">Darkmoon Faire</a>
</section>

Usually I use XPath like this:

$xpath->query('//*[contains(@id, "EU-group-holiday-line")]');

Now the problem is that the website seems to use javascript to generate this content. Also, I don't see any XHR requests, which would help.

Is there any chance to extract the data?

To make it clearer. This is not my website. I need to grab it.

This is the complete page:

https://www.wowhead.com/today-in-wow

P粉536909186P粉536909186378 days ago436

reply all(1)I'll reply

  • P粉041758700

    P粉0417587002023-09-11 13:02:05

    You are right, the site uses client-side JavaScript and does not use additional XHR requests for the above data. So we might expect the data to be inside the code that was loaded initially (HTML JS). Searching the code for something like event=643 (similar to your event=479) confirms our suggestion and generates the required JSON-formatted substring (I Indentation added) ):

    {
       "icon": "calendar_weekendmistsofpandariastart",
       "name": "Timewalking Dungeon Event",
       "side": "both",
       "url": "/event=643/timewalking-dungeon-event"
    },

    So you will see that the extracted data contains the required data, we may use various tools to scrape it.

    reply
    0
  • Cancelreply