search

Home  >  Q&A  >  body text

node.js - node crawler, request to set proxy, always reports an error for help

I use request to crawl images. In order to prevent the IP from being blocked, I use a proxy. However, after using the proxy, I always get an error. nodejs uses request and async modules

function download(item,cb){
  request({
    url:item.img,
    proxy:proxys[Math.random()*proxys.length|0],
    method:'GET',
    timeout:5000
  },function(err,response,body){
    if(response && response.statusCode == 200){
      cb(null,item);
    }
  }).on('error',function(){
    console.log('下载出现异常,可能是pipe有问题,再次请求...');
    download(item,cb);
    // cb(null,item);
  }).pipe(fs.createWriteStream(fileDir2+item.name+'.'+item.url_token+'.jpg'));
}

download(item,cb), cb is the callback function of the control flow in async:

async.eachLimit(items,10,function(item,cb){
    download(item,cb);
},function(){...})

Every time I download a few files, I get an error and stop running:

throw new assert.AssertionError({
  ^
AssertionError: 258 == 0
at ClientRequest.onConnect (C:\Users\fox\WebstormProjects\nodejs\实战\爬虫\node_modules\tunnel-agent\index.js:160:14)

If I remove the proxy request header, nothing will happen; if I change the above download to no longer continue the request and directly cb(), no error will be reported if the request fails.

.on('error',function(){
console.log('下载出现异常,可能是pipe有问题,再次请求...');
  // download(item,cb);
cb(null,item);
})

Please take a look and see if you can help me solve it. I have been thinking about it for a long time and have been troubleshooting it. I don’t know why.

学习ing学习ing2798 days ago1113

reply all(2)I'll reply

  • 伊谢尔伦

    伊谢尔伦2017-06-21 10:14:18

    I have done almost the same function as you before, directly downloading a lot of pictures. I downloaded part of them, and then reported an error. Finally, I tried to wrap a layer of setTimeout, similar to:

    setTimeout(function(){
        download(item, cb);
    },400);

    This is actually good, I wrote a blog post about this: nodejs batch downloading pictures, you can refer to it

    reply
    0
  • 过去多啦不再A梦

    过去多啦不再A梦2017-06-21 10:14:18

    When encountering this kind of problem, the program must have a retry mechanism.
    A good retry mechanism is: on the next try, increase the sleep time appropriately to ensure correct execution.

    reply
    0
  • Cancelreply