search
HomeWeb Front-endJS TutorialNodeJS implements web crawler function example code

NodeJS implements web crawler function example code

Jun 26, 2017 am 10:28 AM
javascriptnodejsaccomplishreptileSimpleWeb page

Previous words

This article will use nodeJS to implement a simple web crawler function

Web page source code

Use the http.get() method to obtain Web page source code, taking the headline page of the hao123 website as an example

http://tuijian.hao123.com/hotrank
var http = require('http');
http.get('http://tuijian.hao123.com/hotrank',function(res){var data = '';
    res.on('data',function(chunk){
        data += chunk;
    });
    res.on('end',function(){
        console.log(data);
    })
});

The results obtained are as follows:

nbsp;html>
<noscript><meta></noscript>

<meta>
<meta>
<meta>
<meta>







<meta>
<meta>
<meta>
<meta>


<title>热点排行榜-头条新闻-hao123新闻导航_hao123上网导航</title>




<link>











<script>
         window.pageId = window.pageId || "hao123-xinwen-tuijian-hotrank";
         window.pageVP = window.pageVP || "hao123-xinwen-tuijian-hotrank";</script>
<!--[if lt IE 7]>
<script src="http://s0.hao123img.com/res/js/common/dd_belatedpng.min.js?1.1.11"></script>
<script>DD_belatedPNG.fix(&#39;#channelTitle&#39;);</script>
<![endif]-->
<script>window.HAO=window.HAO||{};window.HAO.https = false;window.HAO.httpsTrans = function(url){return url};</script>




<link><link><link><link><link><link><link><link><link><link><link><link><link><link><link><link><link><link><script>window.aid = "nWRkrj61PjnYriYYrHfsrHbsnHb";</script><div>
<div><div>
<div>
<a>hao123</a><a><img  src="/static/imghwm/default1.png" data-src="http://s0.hao123img.com/res/img/xinwen.png" class="lazy" alt="NodeJS implements web crawler function example code" ></a>
</div>
<div>
<a><i></i><em>导航</em><i></i></a><div>
<div>
<h3 id="休闲娱乐">休闲娱乐</h3>
<div>
<a>电影</a><a>动漫</a><a>综艺</a><a>搞笑</a><a>直播</a><a>视频</a><a>页游</a><a>明星</a><a>交友</a><a>体育</a><a>足球</a><a>NBA</a><a>星座</a><a>电视剧</a><a>小游戏</a>
</div>
</div>
<div>
<h3 id="生活服务">生活服务</h3>
<div>
<a>团购</a><a>银行</a><a>军事</a><a>房产</a><a>股票</a><a>基金</a><a>天气</a><a>菜谱</a><a>汽车</a><a>地图</a><a>招聘</a><a>儿童</a><a>母婴</a><a>健康</a><a>大学</a><a>手机</a>
</div>
</div>
<div>
<h3 id="其他类别">其他类别</h3>
<div>
<a>软件</a><a>邮箱</a><a>微博</a><a>公益</a><a>宠物</a><a>杀毒</a><a>设计</a><a>电脑</a><a>桌面</a><a>行业</a><a>摄影</a><a>英语</a><a>考试</a><a>学习</a><a>小清新</a>
</div>
</div>
</div>
</div>
<div><form>
<input><button></button><div></div>
<div></div>
</form></div>
<div>
<div>
<a><i></i><em>一键登录</em><i></i></a><div>
<a><i></i><em>VIP俱乐部</em></a><a><em>退出</em></a>
</div>
</div>
<div>
<a><i></i></a><div><img  src="/static/imghwm/default1.png" data-src="http://s0.hao123img.com/res/r/image/2017-05-02/8efec295cd5f4ab991383422af14dcb8.png" class="lazy" alt="NodeJS implements web crawler function example code" ></div>
</div>
<a><i></i></a>
</div>
</div></div>
<div><div><div><div><ul>
<li><a>头条</a></li>
<li><a>娱乐</a></li>
<li><a>体育</a></li>
<li><a>财经</a></li>
<li><a>军事</a></li>
<li><a>国内</a></li>
<li><a>国际</a></li>
<li><a>历史</a></li>
<li><a>科技</a></li>
<li><a>汽车</a></li>
<li><a>教育</a></li>
<li><a>游戏</a></li>
<li><a>房产</a></li>
<li><a>时尚</a></li>
<li><a>热点排行</a></li>
</ul></div></div></div></div>
</div><div>
<div>
<div>
<div>
<div><div><div>
<div>
<div>
<a><img  class="slider-img lazy" src="/static/imghwm/default1.png" data-src="https://gss0.bdstatic.com/5bVWsj_p_tVS5dKfpU_Y_D3/res/r/image/2017-06-07/b1e7252d66852c27dd6c924b12290017.jpg" alt="NodeJS implements web crawler function example code" ></a><div></div>
<div><a>送考车要讲究 毛坦厂中学送考规模庞大</a></div>
</div>
<div>
<a><img  class="slider-img lazy" src="/static/imghwm/default1.png" data-src="https://gss0.bdstatic.com/5bVWsj_p_tVS5dKfpU_Y_D3/res/r/image/2017-06-07/008195cae4d1336af0b63b31e5b01cdb.jpg" alt="NodeJS implements web crawler function example code" ></a><div></div>
<div><a>江苏"拇指西瓜"上市 可连皮食用</a></div>
</div>
<div>
<a><img  class="slider-img lazy" src="/static/imghwm/default1.png" data-src="https://gss0.bdstatic.com/5bVWsj_p_tVS5dKfpU_Y_D3/res/r/image/2017-06-07/74f7b5a74615892839e3b21de8017bc8.jpg" alt="NodeJS implements web crawler function example code" ></a><div></div>
<div><a>非洲女子嫁中国郎 2年后成广场舞明星</a></div>
</div>
<div>
<a><img  class="slider-img lazy" src="/static/imghwm/default1.png" data-src="https://gss0.bdstatic.com/5bVWsj_p_tVS5dKfpU_Y_D3/res/r/image/2017-06-07/5a85d3e78965666eb616b41e2981d24d.jpg" alt="NodeJS implements web crawler function example code" ></a><div></div>
<div><a>广州一考生去错考场 交警蜀黍紧急送考</a></div>
</div>
<div>
<a><img  class="slider-img lazy" src="/static/imghwm/default1.png" data-src="https://gss0.bdstatic.com/5bVWsj_p_tVS5dKfpU_Y_D3/res/r/image/2017-06-07/ea646c52aa9197228d5d2f42899221ec.jpg" alt="NodeJS implements web crawler function example code" ></a><div></div>
<div><a>福建小伙南非建安保公司 持AK47与劫匪激战</a></div>
</div>
<div>
<a><img  class="slider-img lazy" src="/static/imghwm/default1.png" data-src="https://gss0.bdstatic.com/5bVWsj_p_tVS5dKfpU_Y_D3/res/r/image/2017-06-07/b4331032b6375b7b6db10f7cdf19e86c.jpg" alt="NodeJS implements web crawler function example code" ></a><div></div>
<div><a>老师拔河搞怪表情走红 拔河如戏全靠演技</a></div>
</div>
</div>
<div>
<a></a><a></a><a></a><a></a><a></a><a></a>
</div>
<a></a><a></a>
</div></div></div>
<div><div><div>
<div>
<h2 id="八卦热点">八卦热点</h2>
<a>更多八卦>></a>
</div>
<div>
<div><ul>
<li><a><img  class="imglink-img lazy" src="/static/imghwm/default1.png" data-src="http://s0.hao123img.com/res/r/image/2017-04-12/1be8ce1a1520e75f17f3299532855b56.jpg" alt="NodeJS implements web crawler function example code" ><span>男子上山寻宝 挖出这物吓坏了!</span></a></li>
<li><a><img  class="imglink-img lazy" src="/static/imghwm/default1.png" data-src="http://s0.hao123img.com/res/r/image/2017-04-14/55d08df0d7d1023179fca92200a96ce4.jpg" alt="NodeJS implements web crawler function example code" ><span>千年巨蛇镇守古墓竟借尸还魂</span></a></li>
</ul></div>
<div><ul>
<li><a>地球是个监狱人类只是试验品!</a></li>
<li><a>DNA检测是叔叔的可爸爸是独子</a></li>
<li><a>出差两月打开电饭锅后惊呆了</a></li>
<li><a>女孩中大奖4年后怒告彩票公司</a></li>
<li><a>印度神牛竟拉出300多颗钻石!</a></li>
<li><a>21岁男孩吞云吐雾成烟雾之神!</a></li>
<li><a>继母让3孩子喝农药,继女死亡</a></li>
<li><a>惊呆!实拍假鸡蛋制作的全过程</a></li>
</ul></div>
</div>
</div></div></div>
</div>
<div><div><script>{di:"u0000",tn:"sitehao123_03",rsi0:"1190",rsi1:"150",type:"metro",version:"201",style:"lichun"}</script></div></div>
<div><div>
<div><div>
<h2 id="实时热点">实时热点</h2>
<div>
<div>
<span>排名</span><span>关键词</span><span>搜索指数</span>
</div>
<div>
<div>
<span>1</span><span><a>美国逮捕女斯诺登</a></span><span></span><span>35388</span><a></a>
</div>
<div>
<span>2</span><span><a>成都隐秘母乳买卖</a></span><span></span><span>34497</span><a></a>
</div>
<div>
<span>3</span><span><a>曝周杰伦青涩旧照</a></span><span></span><span>1457</span><a></a>
</div>
<div>
<span>4</span><span><a>老头公交强吻女孩</a></span><span></span><span>103307</span><a></a>
</div>
<div>
<span>5</span><span><a>王传君恋情曝光</a></span><span></span><span>26616</span><a></a>
</div>
<div>
<span>6</span><span><a>杭州现奇葩窗口</a></span><span></span><span>26837</span><a></a>
</div>
<div>
<span>7</span><span><a>忘带全班准考证</a></span><span></span><span>125127</span><a></a>
</div>
<div>
<span>8</span><span><a>未成年持械拍网红</a></span><span></span><span>1672</span><a></a>
</div>
<div>
<span>9</span><span><a>9秒揍儿子8拳</a></span><span></span><span>93193</span><a></a>
</div>
<div>
<span>10</span><span><a>戴耳机穿轨道被撞</a></span><span></span><span>195745</span><a></a>
</div>
</div>
</div>
</div></div>
<div><div>
<h2 id="今日热点">今日热点</h2>
<div>
<div>
<span>排名</span><span>关键词</span><span>搜索指数</span>
</div>
<div>
<div>
<span>1</span><span><a>北京回龙观大火</a></span><span></span><span>174225</span><a></a>
</div>
<div>
<span>2</span><span><a>选美冠军车祸身亡</a></span><span></span><span>172447</span><a></a>
</div>
<div>
<span>3</span><span><a>2017高考</a></span><span></span><span>136806</span><a></a>
</div>
<div>
<span>4</span><span><a>成都老火锅店被查</a></span><span></span><span>121729</span><a></a>
</div>
<div>
<span>5</span><span><a>陈浩民娇妻秀身材</a></span><span></span><span>115877</span><a></a>
</div>
<div>
<span>6</span><span><a>海边直播发现浮尸</a></span><span></span><span>86157</span><a></a>
</div>
<div>
<span>7</span><span><a>曝印小天遭妻骗婚</a></span><span></span><span>83749</span><a></a>
</div>
<div>
<span>8</span><span><a>苹果开发者大会</a></span><span></span><span>78140</span><a></a>
</div>
<div>
<span>9</span><span><a>6万斤鱼缺氧死亡</a></span><span></span><span>68984</span><a></a>
</div>
<div>
<span>10</span><span><a>安以轩夏威夷大婚</a></span><span></span><span>56675</span><a></a>
</div>
</div>
</div>
</div></div>
<div><div>
<h2 id="民生热点">民生热点</h2>
<div>
<div>
<span>排名</span><span>关键词</span><span>搜索指数</span>
</div>
<div>
<div>
<span>1</span><span><a>北京回龙观大火</a></span><span></span><span>174225</span><a></a>
</div>
<div>
<span>2</span><span><a>2017高考</a></span><span></span><span>136806</span><a></a>
</div>
<div>
<span>3</span><span><a>成都老火锅店被查</a></span><span></span><span>121729</span><a></a>
</div>
<div>
<span>4</span><span><a>海边直播发现浮尸</a></span><span></span><span>86157</span><a></a>
</div>
<div>
<span>5</span><span><a>苹果开发者大会</a></span><span></span><span>78140</span><a></a>
</div>
<div>
<span>6</span><span><a>6万斤鱼缺氧死亡</a></span><span></span><span>68984</span><a></a>
</div>
<div>
<span>7</span><span><a>北控外援训练猝死</a></span><span></span><span>50687</span><a></a>
</div>
<div>
<span>8</span><span><a>武汉男子裸体捅人</a></span><span></span><span>45810</span><a></a>
</div>
<div>
<span>9</span><span><a>多国与卡塔尔断交</a></span><span></span><span>44475</span><a></a>
</div>
<div>
<span>10</span><span><a>美驻华外交官辞职</a></span><span></span><span>44394</span><a></a>
</div>
</div>
</div>
</div></div>
<div><div>
<h2 id="电影">电影</h2>
<div>
<div>
<span>排名</span><span>关键词</span><span>搜索指数</span>
</div>
<div>
<div>
<span>1</span><span><a>神奇女侠</a></span><span></span><span>40981</span><a></a>
</div>
<div>
<span>2</span><span><a>异星觉醒</a></span><span></span><span>15245</span><a></a>
</div>
<div>
<span>3</span><span><a>新木乃伊</a></span><span></span><span>7183</span><a></a>
</div>
<div>
<span>4</span><span><a>中国推销员</a></span><span></span><span>5890</span><a></a>
</div>
<div>
<span>5</span><span><a>荡寇风云</a></span><span></span><span>3006</span><a></a>
</div>
<div>
<span>6</span><span><a>异兽来袭</a></span><span></span><span>2566</span><a></a>
</div>
<div>
<span>7</span><span><a>李雷和韩梅梅</a></span><span></span><span>1636</span><a></a>
</div>
<div>
<span>8</span><span><a>北极星</a></span><span></span><span>1139</span><a></a>
</div>
<div>
<span>9</span><span><a>美好的意外</a></span><span></span><span>971</span><a></a>
</div>
<div>
<span>10</span><span><a>夏天19岁的肖像</a></span><span></span><span>783</span><a></a>
</div>
</div>
</div>
</div></div>
<div><div>
<h2 id="电视剧">电视剧</h2>
<div>
<div>
<span>排名</span><span>关键词</span><span>搜索指数</span>
</div>
<div>
<div>
<span>1</span><span><a>龙珠传奇</a></span><span></span><span>999788</span><a></a>
</div>
<div>
<span>2</span><span><a>楚乔传</a></span><span></span><span>538848</span><a></a>
</div>
<div>
<span>3</span><span><a>欢乐颂2</a></span><span></span><span>257015</span><a></a>
</div>
<div>
<span>4</span><span><a>欢乐颂</a></span><span></span><span>176799</span><a></a>
</div>
<div>
<span>5</span><span><a>职场是个技术活</a></span><span></span><span>73102</span><a></a>
</div>
<div>
<span>6</span><span><a>择天记</a></span><span></span><span>67290</span><a></a>
</div>
<div>
<span>7</span><span><a>美食大冒险</a></span><span></span><span>61792</span><a></a>
</div>
<div>
<span>8</span><span><a>废柴兄弟</a></span><span></span><span>50419</span><a></a>
</div>
<div>
<span>9</span><span><a>人民的名义</a></span><span></span><span>46353</span><a></a>
</div>
<div>
<span>10</span><span><a>三生三世十里桃花</a></span><span></span><span>24386</span><a></a>
</div>
</div>
</div>
</div></div>
<div><div>
<h2 id="综艺">综艺</h2>
<div>
<div>
<span>排名</span><span>关键词</span><span>搜索指数</span>
</div>
<div>
<div>
<span>1</span><span><a>变形计</a></span><span></span><span>223319</span><a></a>
</div>
<div>
<span>2</span><span><a>来吧冠军</a></span><span></span><span>151641</span><a></a>
</div>
<div>
<span>3</span><span><a>拜托了冰箱</a></span><span></span><span>149596</span><a></a>
</div>
<div>
<span>4</span><span><a>昆仑决</a></span><span></span><span>139633</span><a></a>
</div>
<div>
<span>5</span><span><a>天生是优我</a></span><span></span><span>124472</span><a></a>
</div>
<div>
<span>6</span><span><a>姐姐好饿</a></span><span></span><span>99619</span><a></a>
</div>
<div>
<span>7</span><span><a>脑力男人时代</a></span><span></span><span>68735</span><a></a>
</div>
<div>
<span>8</span><span><a>奔跑吧兄弟</a></span><span></span><span>61903</span><a></a>
</div>
<div>
<span>9</span><span><a>我想和你唱</a></span><span></span><span>59249</span><a></a>
</div>
<div>
<span>10</span><span><a>玫瑰之旅</a></span><span></span><span>50425</span><a></a>
</div>
</div>
</div>
</div></div>
</div></div>
</div>
</div>
</div>
<div>
<div>
<div></div>
<div><a>意见反馈</a></div>
</div>
<div>
<div></div>
<div><a>返回顶部</a></div>
</div>
</div>









<div><div>
<div>
<a>hao123 上网导航第一品牌</a><div>
<a>关于我们</a><a>常见问题</a><a>反馈意见</a><a>全站地图</a><span>京ICP证030173号</span>
</div>
</div>
<div><div>
<a><i></i><span>下载<br>手机端</span></a><a><i></i><span>收藏<br>本站</span></a>
</div></div>
</div></div><script></script>










<script></script>
<script>BigPipe.lazyPagelets = [];</script>
<script>BigPipe.loadedResource(["5a7c104a8_7959","d8b3cc9ac_29e3","38645dd_f7dd","8d1d978b0_a316","6cca09af6_f07f","a0832ac19_fb25","25330c25d_ce62","deba0d4c0_c8fe","1c81d5fc6_a695","0c7877e81_8719","6e9548c75_e646","38645dd_0f3e","3f6d691_9321","4d7a174_ccfc","9e71d5b_bed3","b016c1d_d1a3","e073b71_9403","77f7c66_45f3","95a138325_0731"]);</script><script>BigPipe.hooks["__cb_0_1"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);var fixreferrer = require(&#39;fe:widget/js/base/fixreferrer.js&#39;);
            HAO.https && fixreferrer.init($(document));
        };</script>
<script>BigPipe.hooks["__cb_0_2"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);$(&#39;div[data-hook="sitemap"]&#39;).on(&#39;mouseenter&#39;, function (e) {$(this).addClass(&#39;sitemap-hover&#39;);}).on(&#39;mouseleave&#39;, function (e) {$(this).removeClass(&#39;sitemap-hover&#39;);});};</script>
<script>BigPipe.hooks["__cb_0_3"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);var Search = require(&#39;fe:widget/js/base/search.js&#39;);var headerSearchInstance = new Search($(&#39;form[data-hook="search-form"]&#39;));};</script>
<script>BigPipe.hooks["__cb_0_4"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);var events = require(&#39;fe:widget/js/lib/events.js&#39;);var login = require(&#39;fe:widget/js/base/login.js&#39;);var sethome = require(&#39;fe:widget/js/base/sethome.js&#39;);var $loginCon = $(&#39;div[data-hook="c-header-login"]&#39;);var $loginDrop = $(&#39;div[js-hook="popup-list"]&#39;);login.init();events.on(&#39;loginSuccess&#39;, function(userinfo) {$loginCon.addClass(&#39;success&#39;);$loginCon.find(&#39;.key .word&#39;).html(userinfo.userName);/* if ($loginCon.find(&#39;.key .word&#39;).width() >= 60) {$loginCon.find(&#39;.key .word&#39;).width(50);$loginDrop.outerWidth($loginCon.outerWidth());}*/$(&#39;[data-hook=login]&#39;).removeAttr(&#39;data-hook&#39;);});$loginCon.mouseenter(function() {if($(this).hasClass(&#39;success&#39;)) {$(this).addClass(&#39;hover&#39;);}}).mouseleave(function() {$(this).removeClass(&#39;hover&#39;);});$(&#39;div[data-hhok="qrcode"]&#39;).on(&#39;mouseenter&#39;, function () {$(this).children(&#39;div&#39;).show();}).on(&#39;mouseleave&#39;, function () {$(this).children(&#39;div&#39;).hide();}).on(&#39;click&#39;, function (ev) {if ($(this).children(&#39;div&#39;).length > 0) {return false;}});if($(&#39;[data-hook=setHome]&#39;).length) {sethome.init();}};</script>
<script>BigPipe.hooks["__cb_0_5"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);var popupWidth;$(&#39;div[data-hook="nav-more"]&#39;).on(&#39;mouseenter&#39;, function () {popupWidth = $(this).children(&#39;div&#39;).width();$(this).addClass(&#39;nav-more-hover&#39;);}).on(&#39;mouseleave&#39;, function () {$(this).removeClass(&#39;nav-more-hover&#39;);});};</script>
<script>BigPipe.hooks["__cb_0_6"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);var $v2Header = $(&#39;#erjiV2Header&#39;);var $fixedNav = $(&#39;#fixedNav&#39;);if ($v2Header.hasClass(&#39;v2-fixed&#39;) && !($.browser.msie && $.browser.version < 7)) {var offHeight = 0;$(window).scroll(function () {offHeight = $v2Header.offset().top + 60;if ($(window).scrollTop() >= offHeight) {if (!$fixedNav.hasClass(&#39;nav-v2-fixed&#39;)) {$fixedNav.addClass(&#39;nav-v2-fixed&#39;).find(&#39;li.cur&#39;).removeClass(&#39;cur&#39;).addClass(&#39;cur&#39;);}}else if ($fixedNav.hasClass(&#39;nav-v2-fixed&#39;)) {$fixedNav.removeClass(&#39;nav-v2-fixed&#39;).find(&#39;li.cur&#39;).removeClass(&#39;cur&#39;).addClass(&#39;cur&#39;);}});}};</script>
<script>BigPipe.hooks["__cb_0_7"]=function(){&#39;use strict&#39;;var $ = require(&#39;fe:widget/js/base/jquery.js&#39;);var Slider = require(&#39;fe:widget/js/util/slider.js&#39;);new Slider($(&#39;.slider&#39;));};</script>
<script>BigPipe.hooks["__cb_0_8"]=function(){&#39;use strict&#39;;if(typeof BAIDU_SS_HHRUN!=&#39;function&#39;){var d=document;(d.getElementsByTagName(&#39;head&#39;)[0]||d.body).appendChild(d.createElement(&#39;script&#39;)).src=&#39;http://su.bdimg.com/static/dspui/js/ls.js?v=&#39;+~(-new Date()/5600e5)}else{BAIDU_SS_HHRUN()}};</script>
<script>BigPipe.hooks["__cb_0_9"]=function(){&#39;use strict&#39;;var lifttop = require(&#39;tuijian:widget/lift/lifttop.js&#39;);lifttop();};</script>
<script>BigPipe.hooks["__cb_0_10"]=function(){&#39;use strict&#39;;

        window._bd_share_config = {
            common : {
                bdText : &#39;&#39;,
                bdDesc : &#39;&#39;,
                bdUrl : &#39;&#39;,
                bdPic : &#39;&#39;},
            share : {"bdSize" : 24},
            selectShare : [{"bdselectMiniList" : [&#39;tsina&#39;,&#39;weixin&#39;,&#39;qzone&#39;]
            }]
        };
        (document.getElementsByTagName(&#39;head&#39;)[0]||document.body)
        .appendChild(document.createElement(&#39;script&#39;)).src=&#39;http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion=&#39;+~(-new Date());var shareEvent = require(&#39;tuijian:widget/index/content/shareEvent.js&#39;);
        shareEvent();
    };</script>
<script>BigPipe.hooks["__cb_0_11"]=function(){&#39;use strict&#39;;var addBookmark = require(&#39;fe:widget/js/base/addbookmark.js&#39;);addBookmark.init();};</script>
<script>BigPipe.hooks["__cb_0_12"]=function(){&#39;use strict&#39;;

            
            (function initTrack(o){var d = document;var x = d.createElement("script");
                x.src = HAO.httpsTrans(&#39;http://s0.hao123img.com/res/js/track.js&#39;) + &#39;?&#39;+~(new Date/36e5);var a=[];if(o){                  for(var i in o){
                    a.push(i + ":" + (o[i]))
                  }                  var config = a.join(";");
                  x.setAttribute("data-log-config", config);                  var s = d.getElementsByTagName("script")[0].parentNode;                  var p= s || d.head;                  if(p) {
                    setTimeout(function() {
                        p.appendChild(x)
                    }, 0);
                  }
                }
            })({
                  pageId: window.pageId,
                  page: window.pageId,
                  level: 2,
                  vp: window.pageVP || window.pageId,
                aid: window.aid || &#39;&#39;});
            
            window.js_track_loaded = function (success) {if (success) {
                    window.js_track_loaded = null;if (window.aid) {/* globals Monkey */Monkey && Monkey.set && Monkey.set(&#39;aid&#39;, window.aid);
                    }
                }
            };// 跨站资源统计/* (function (doc) {
                var s = doc.createElement(&#39;script&#39;);
                s.src = HAO.httpsTrans(&#39;http://s0.hao123img.com/res/js/fe/cspalog.js&#39;) + &#39;?t=&#39; + (+new Date);
                var parent = doc.getElementsByTagName(&#39;script&#39;)[0].parentNode;
                parent.appendChild(s);
            })(document); */};</script>
<script>BigPipe.hooks["__cb_0_13"]=function(){&#39;use strict&#39;;

    require.defer(["fe:widget/js/base/jquery.js?1.1.11","fe:widget/js/base/detect.js?1.1.11","tuijian:widget/index/kuaixun.js?1.1.11"], function ($, detect, kuaixun) {
        $(document).ready(function() {
            detect();
            kuaixun.init();
        });
    });
};</script>
<script>BigPipe.setResourceMap({"d8b3cc9ac_29e3":{"src":"http:\/\/s1.hao123img.com\/resource\/fe\/pkg\/aio-eef856ab5.231bb088c.css?1.1.11","type":"css","deps":[],"mods":["fe:resource\/css\/base.less"]},"38645dd_f7dd":{"src":"http:\/\/s2.hao123img.com\/resource\/tuijian\/css\/hotrank.38645dd.css?1.1.11","type":"css","deps":[],"mods":["tuijian:resource\/css\/hotrank.less"]},"8d1d978b0_a316":{"src":"http:\/\/s1.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/header.8d1d978b0.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/header.less"]},"6cca09af6_f07f":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/logo\/logo.6cca09af6.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/logo\/logo.less"]},"a0832ac19_fb25":{"src":"http:\/\/s1.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/sitemap\/sitemap.a0832ac19.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/sitemap\/sitemap.less"]},"25330c25d_ce62":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/adv\/adv.25330c25d.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/adv\/adv.less"]},"deba0d4c0_c8fe":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/form\/form.deba0d4c0.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/form\/form.less"]},"1c81d5fc6_a695":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/tools\/tools.1c81d5fc6.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/tools\/tools.less"]},"0c7877e81_8719":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/nav\/nav.0c7877e81.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/nav\/nav.less"]},"6e9548c75_e646":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/widget\/ui\/header\/common\/v2\/tuiguang\/tuiguang.6e9548c75.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/v2\/tuiguang\/tuiguang.less"]},"38645dd_0f3e":{"src":"http:\/\/s0.hao123img.com\/resource\/tuijian\/widget\/index\/hotrank\/hotrank.38645dd.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/index\/hotrank\/hotrank.less"]},"3f6d691_9321":{"src":"http:\/\/s2.hao123img.com\/resource\/tuijian\/widget\/index\/hotrank\/index\/slider\/slider.3f6d691.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/index\/hotrank\/index\/slider\/slider.less"]},"4d7a174_ccfc":{"src":"http:\/\/s0.hao123img.com\/resource\/tuijian\/widget\/index\/hotrank\/common\/slider\/slider.4d7a174.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/index\/hotrank\/common\/slider\/slider.less"]},"9e71d5b_bed3":{"src":"http:\/\/s0.hao123img.com\/resource\/tuijian\/widget\/index\/hotrank\/index\/news\/news.9e71d5b.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/index\/hotrank\/index\/news\/news.less"]},"b016c1d_d1a3":{"src":"http:\/\/s1.hao123img.com\/resource\/tuijian\/widget\/index\/hotrank\/index\/fyb\/fyb.b016c1d.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/index\/hotrank\/index\/fyb\/fyb.less"]},"e073b71_9403":{"src":"http:\/\/s0.hao123img.com\/resource\/tuijian\/widget\/index\/hotrank\/index\/top\/top.e073b71.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/index\/hotrank\/index\/top\/top.less"]},"77f7c66_45f3":{"src":"http:\/\/s0.hao123img.com\/resource\/tuijian\/widget\/lift\/lift.77f7c66.css?1.1.11","type":"css","deps":[],"mods":["tuijian:widget\/lift\/lift.less"]},"95a138325_0731":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/pkg\/aio-8155b5719.3dd99d32e.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/footer\/common\/footer.less"]},"ed29b1dff_99f2":{"src":"http:\/\/s1.hao123img.com\/resource\/fe\/pkg\/aio-752ba7752.ed29b1dff.js?1.1.11","type":"js","deps":[],"mods":["fe:widget\/js\/base\/jquery.js?1.1.11"]},"499abaa0e_acda":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/pkg\/aio-eef856ab5.499abaa0e.js?1.1.11","type":"js","deps":["ed29b1dff_99f2","15f327f0a_5d72"],"mods":["fe:widget\/js\/base\/browser.js?1.1.11","fe:widget\/js\/base\/fixreferrer.js?1.1.11"]},"15f327f0a_5d72":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/pkg\/aio-95cc3013d.15f327f0a.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["fe:widget\/js\/base\/cookie.js?1.1.11"]},"331938377_b942":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/pkg\/aio-1c2d6f9f2.2b182a527.css?1.1.11","type":"css","deps":[],"mods":["fe:widget\/ui\/header\/common\/header.less"]},"2009b1512_46d0":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/pkg\/aio-1c2d6f9f2.2009b1512.js?1.1.11","type":"js","deps":["ed29b1dff_99f2","15f327f0a_5d72","331938377_b942"],"mods":["fe:widget\/js\/base\/sethome.js?1.1.11","fe:widget\/js\/lib\/events.js?1.1.11","fe:widget\/js\/base\/login.js?1.1.11","fe:widget\/js\/third\/arttemplate\/template-native.js?1.1.11","fe:widget\/js\/base\/autocomplete.js?1.1.11","fe:widget\/js\/base\/search.js?1.1.11","fe:widget\/ui\/header\/common\/header.js?1.1.11"]},"9a092a7f1_2a6f":{"src":"http:\/\/s0.hao123img.com\/resource\/fe\/widget\/js\/util\/slider.9a092a7f1.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["fe:widget\/js\/util\/slider.js?1.1.11"]},"f271c78_c7d7":{"src":"http:\/\/s0.hao123img.com\/resource\/tuijian\/widget\/lift\/lifttop.f271c78.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["tuijian:widget\/lift\/lifttop.js?1.1.11"]},"4d39d64_93de":{"src":"http:\/\/s1.hao123img.com\/resource\/tuijian\/widget\/index\/content\/shareEvent.4d39d64.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["tuijian:widget\/index\/content\/shareEvent.js?1.1.11"]},"3ac67f28c_b365":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/pkg\/aio-8155b5719.3ac67f28c.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["fe:widget\/js\/base\/addbookmark.js?1.1.11"]},"67402ee5d_d72b":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/widget\/js\/base\/track.67402ee5d.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["fe:widget\/js\/base\/track.js?1.1.11"]},"f97e9ecfd_31c5":{"src":"http:\/\/s1.hao123img.com\/resource\/fe\/widget\/js\/base\/detect.f97e9ecfd.js?1.1.11","type":"js","deps":["67402ee5d_d72b"],"mods":["fe:widget\/js\/base\/detect.js?1.1.11"]},"2e29525_fe44":{"src":"http:\/\/s1.hao123img.com\/resource\/tuijian\/widget\/index\/kuaixun.2e29525.js?1.1.11","type":"js","deps":["ed29b1dff_99f2"],"mods":["tuijian:widget\/index\/kuaixun.js?1.1.11"]},"5a7c104a8_7959":{"src":"http:\/\/s2.hao123img.com\/resource\/fe\/js\/lib\/main.5a7c104a8.js?1.1.11","type":"js","deps":[],"mods":["fe:resource\/js\/lib\/main.js?1.1.11"]}});</script>
<script>BigPipe.onPageletArrive({"id":null,"children":[],"renderMode":"default","parent":null,"deps":{"beforedisplay":["d8b3cc9ac_29e3","38645dd_f7dd","8d1d978b0_a316","6cca09af6_f07f","a0832ac19_fb25","25330c25d_ce62","deba0d4c0_c8fe","1c81d5fc6_a695","0c7877e81_8719","6e9548c75_e646","38645dd_0f3e","3f6d691_9321","4d7a174_ccfc","9e71d5b_bed3","b016c1d_d1a3","e073b71_9403","77f7c66_45f3","95a138325_0731"],"load":["ed29b1dff_99f2","499abaa0e_acda","2009b1512_46d0","9a092a7f1_2a6f","f271c78_c7d7","4d39d64_93de","3ac67f28c_b365"]},"hooks":{"load":["__cb_0_1","__cb_0_2","__cb_0_3","__cb_0_4","__cb_0_5","__cb_0_6","__cb_0_7","__cb_0_8","__cb_0_9","__cb_0_10","__cb_0_11","__cb_0_12","__cb_0_13"]}});</script>
<!--24343361510346110218060803-->
<script> var _trace_page_logid = 2434336151; </script>
View Code

Filter data

Based on the variety show hot spots on the webpage

The relevant source code is as follows

Through analysis, it can be seen that the 'Variety Show' module and other modules are located in

, and the name of the variety show is located in

cheerio

How do we get useful data from the source code? First, nodeJS does not support document objects. If you want to use a stupid method, you can only use regular expressions to process

cheerio is specially customized by nodejs for the server, and can quickly and flexibly implement the JQuery core. It works on the DOM model, and is very efficient in parsing, operating and rendering

【Installation】

【Use】

It The usage method is quite similar to jQuery, and it is very easy to get started. Take the title of the top 10 most popular variety shows as an example. The rankings of the six parts of 'Today's Hot Topics', 'People's Livelihood Hot Topics', 'Movies', 'TV Series' and 'Variety Shows' are climbed down and placed in the array in the object named 'result'. The commands are 'ss', ' jr'、'ms'、'dy'、'dsj'、'zy'

[The code is as follows]

var http = require('http');var cheerio = require('cheerio');
http.get('http://tuijian.hao123.com/hotrank',function(res){var data = '';
    res.on('data',function(chunk){
        data += chunk;
    });
    res.on('end',function(){
        filter(data);
    })
});function filter(data){//保存搜索量前10的综艺节目标题var result = [];//将页面源代码转换为$对象var $ = cheerio.load(data);//查找每个综艺节目标题的外层divvar temp_arr = $('[monkey = "zy"]').find('.point-bd').find('.point-title');//将综艺节目标题依次保存到结果数组中temp_arr.each(function(index,item){
        result.push($(item).text());
    })//[ '变形计','来吧冠军','拜托了冰箱','昆仑决','天生是优我','姐姐好饿','脑力男人时代','奔跑吧兄弟','我想和你唱','玫瑰之旅' ]    console.log(result);
}

[The results are as follows]
var http = require('http');var cheerio = require('cheerio');
http.get('http://tuijian.hao123.com/hotrank',function(res){var data = '';
    res.on('data',function(chunk){
        data += chunk;
    });
    res.on('end',function(){
        filter(data);
    })
});function filter(data){//保存各部分搜索量前10的名称//对象名为榜单名,如'实时热点'//对象内容为10个标题名称组成的数组var result = {};//将页面源代码转换为$对象var $ = cheerio.load(data);//查找'实时热点'、'今日热点'、'民生热点'、'电影'、'电视剧'、'综艺'这6个榜单所在的divvar temp_div = $('.top-wrap');//保存榜单名称var temp_title = [];

    temp_div.each(function(index,item){//查找榜单名,并保存到temp_title文件夹中temp_title.push($(item).find('h2').text());//查找每类下每个标题的外层divvar temp_arr = $(item).find('.point-bd').find('.point-title');//将result下的每个榜单初始化为一个数组var innerResult = result[temp_title[index]] = [];//将节目标题依次保存到相应榜单的数组中temp_arr.each(function(_index,_item){
            innerResult.push($(_item).text())
        })
    })
    console.log(result);
}

The above is the detailed content of NodeJS implements web crawler function example code. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
From C/C   to JavaScript: How It All WorksFrom C/C to JavaScript: How It All WorksApr 14, 2025 am 12:05 AM

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

JavaScript Engines: Comparing ImplementationsJavaScript Engines: Comparing ImplementationsApr 13, 2025 am 12:05 AM

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

Beyond the Browser: JavaScript in the Real WorldBeyond the Browser: JavaScript in the Real WorldApr 12, 2025 am 12:06 AM

JavaScript's applications in the real world include server-side programming, mobile application development and Internet of Things control: 1. Server-side programming is realized through Node.js, suitable for high concurrent request processing. 2. Mobile application development is carried out through ReactNative and supports cross-platform deployment. 3. Used for IoT device control through Johnny-Five library, suitable for hardware interaction.

Building a Multi-Tenant SaaS Application with Next.js (Backend Integration)Building a Multi-Tenant SaaS Application with Next.js (Backend Integration)Apr 11, 2025 am 08:23 AM

I built a functional multi-tenant SaaS application (an EdTech app) with your everyday tech tool and you can do the same. First, what’s a multi-tenant SaaS application? Multi-tenant SaaS applications let you serve multiple customers from a sing

How to Build a Multi-Tenant SaaS Application with Next.js (Frontend Integration)How to Build a Multi-Tenant SaaS Application with Next.js (Frontend Integration)Apr 11, 2025 am 08:22 AM

This article demonstrates frontend integration with a backend secured by Permit, building a functional EdTech SaaS application using Next.js. The frontend fetches user permissions to control UI visibility and ensures API requests adhere to role-base

JavaScript: Exploring the Versatility of a Web LanguageJavaScript: Exploring the Versatility of a Web LanguageApr 11, 2025 am 12:01 AM

JavaScript is the core language of modern web development and is widely used for its diversity and flexibility. 1) Front-end development: build dynamic web pages and single-page applications through DOM operations and modern frameworks (such as React, Vue.js, Angular). 2) Server-side development: Node.js uses a non-blocking I/O model to handle high concurrency and real-time applications. 3) Mobile and desktop application development: cross-platform development is realized through ReactNative and Electron to improve development efficiency.

The Evolution of JavaScript: Current Trends and Future ProspectsThe Evolution of JavaScript: Current Trends and Future ProspectsApr 10, 2025 am 09:33 AM

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

Demystifying JavaScript: What It Does and Why It MattersDemystifying JavaScript: What It Does and Why It MattersApr 09, 2025 am 12:07 AM

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.