search

Home  >  Q&A  >  body text

javascript - node crawls Weibo

I am new to node and want to write a crawler to crawl Sina Weibo comments, but I found that the page is dynamically generated by JS and cannot be crawled with the http module, so I used phantomjs to crawl it (I heard it will be slower, It has been running for nearly 15 minutes. It is too slow. I wonder if I wrote it wrong), but it still doesn’t work. Is there any way to crawl web pages similar to Sina Weibo?

let page=require("webpage").create();
let url="http://weibo.com/1713926427/Etq2WnSiR?filter=hot&root_comment_id=0&type=comment";
/*page.settings = {
    javascriptEnabled: true,
    loadImages: false,
    webSecurityEnabled: false,
    userAgent: 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36 LBBROWSER'
};*/
page.open(url,(status)=>{
    console.log("Status:"+status);
    if(status=="success"){
        let val = page.evaluate(()=>{
            var list_box=document.querySelector(".list_box");
            console.log(list_box);
            return list_box
        });
        console.log(val)
    }else{
        console.log("failed")
    }
    phantom.exit();
});
仅有的幸福仅有的幸福2746 days ago1013

reply all(1)I'll reply

  • 扔个三星炸死你

    扔个三星炸死你2017-06-30 10:02:08

    I have written about crawling Weibo. There are two ideas

    1. If you look carefully, there should be an interface to get the corresponding data and then use regular expressions to match it

    2. Weibo provides a developer API interface, although it is more troublesome to use

    reply
    0
  • Cancelreply