I want to implement some code to collect comments from a specific page DOM.
The cURL result is incomplete and I don't know why because some subtags in the DOM are not visible in the result.
The DOM looks like this in the inspector:
I try to collect the DOM using the following code snippet:
$domain = 'feefo.com'; $page_id = 'firebrand-promotions'; $curli = curl_init(); curl_setopt_array($curli, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_FRESH_CONNECT => true, CURLOPT_URL => 'https://www.' . $domain . '/en-US/reviews/' . $page_id . '?displayFeedbackType=SERVICE&timeFrame=YEAR' CURLOPT_HTTPHEADER => [ 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,* /*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Accept-Language: en-US;q=0.8,en;q=0.7', 'Cache-control: max-age=0', 'Referer: https://' . $domain, 'sec-fetch-mode: navigate', 'sec-fetch-site: none', 'sec-fetch-dest: document', 'sec-fetch-user: ?1', 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36' ] ]); $curlResult = curl_exec($curli);
What I see in the cURL result content section is this:
<div class="container"> <global></global> </div>
So the
tag looks empty, but it shouldn't be.
I try to extract the
tag content using the following code:
$dom = new DOMDocument(); $dom->validateOnParse = true; @$dom->loadHTML($curlResult); $globals = $dom->getElementsByTagName('global'); $xmlPath = new DOMXPath($dom); $reviews = $xmlPath->query('//global');
But I still don't see any tags in the
tags.
Can someone explain this problem to me? how to solve this problem?
Thank you very much for your help, effort and time. :)
P粉1240704512023-09-13 15:04:30
It's very possible that what you get in Curl is exactly what the browser gets, but the browser starts executing javascript that modifies the DOM.
You can't see with with Curl because Curl cannot execute Javascript.