Home >Web Front-end >HTML Tutorial >How Python crawler handles the delayed loading part (delayload_url) in html_html/css_WEB-ITnose

How Python crawler handles the delayed loading part (delayload_url) in html_html/css_WEB-ITnose

WBOY
WBOYOriginal
2016-06-24 11:47:222477browse

Download the source code of the link "http://s.1688.com/selloffer/industry_offer_search.htm?mixWholesale=true&industryFlag=food&categoryId=1032913&from=industrySearch&n=y&filt=y#_fb_top", the result only contains part of the page; There are a total of 60 products on this page, but only 20 can be parsed from the source code, and the page turning link cannot be found;



It should be the delayed loading implemented by the above source code. When the page pulley is pulled down to the bottom, the new part is loaded. Please tell me how to parse this page. Get the complete page source code and parse all 60 products and page turning links.


Reply to the discussion (solution)

Audit the element to find the data source link and directly use that link to obtain the data

Um. . . I don’t know if it’s too late to answer now! This can capture the delayed loading URL address through Firefox, and then you can find the pattern. I happened to be crawling 1688 data and encountered the problem of delayed loading. Then I captured the URL through Firefox and found that I only need to take out the URL in the div sw-delayload-url and add &callback=any character at the end. string, and then change &startIndex= this every time (startIndex=20, startIndex=40), this will return a json data
I tried the url you posted, but I don’t know why no data is returned. , maybe the product has been removed from the shelves. . . You can try what I said
. If you have solved it and have a better method, I hope you can share it with me. Thank you

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn