search

Home  >  Q&A  >  body text

nosql - MongoDB的MR问题!

听说mongodb的MapReduce是单线程的,性能很差,这是怎么回事?差到什么程度呢??有哪位大侠能说说原理。

怪我咯怪我咯2767 days ago778

reply all(3)I'll reply

  • PHP中文网

    PHP中文网2017-04-21 11:18:15

    I don’t know whether the execution inside is single-threaded, but if it is a production environment, it is best not to directly access the mapReduce results every time. Depending on the size of the data, it will still take a certain amount of time. Our data is in the tens of millions, and each execution of mapReduce takes about 5-6 seconds. Fortunately, our application is not very real-time. So basically the data is cached for 2 hours, and then mapReduce is executed to obtain the latest results.

    reply
    0
  • ringa_lee

    ringa_lee2017-04-21 11:18:15

    I think this article will explain the performance issues of mongodb!
    http://stackoverflow.com/questions/39...

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-21 11:18:15

    I have done similar things before using MapReduce. Because it was time consuming, I later modified it to use aggregate query for statistics. The specific example is as follows:

    > db.user.findOne()
    {
        "_id" : ObjectId("557a53e1e4b020633455b898"),
        "accountId" : "55546fc8e4b0d8376000b858",
        "tags" : [
            "金牌会员",
            "钻石会员",
            "铂金会员",
            "高级会员"
        ]
    }

    The basic document model is as above, I indexed it on accountId and tags

    db.user.ensureIndex({"accountId":1, "tags":1})

    Now it is required to count the tags under the user. MapReduce is designed as follows:

    var mapFunction = function() {
       if(this.tags){
           for (var idx = 0; idx < this.tags.length; idx++) {
               var tag = this.tags[idx];
               emit(tag, 1);
           }
       }
    };
    
    var reduceFunction = function(key, values) {
        var cnt=0;   
        values.forEach(function(val){ cnt+=val;});  
        return cnt;
    };
    
    
    db.user.mapReduce(mapFunction,reduceFunction,{out:"mr1"})    //输出到集合mr1中

    Result:

    > db.mr1.find().pretty()
    { "_id" : "金牌会员", "value" : 9000 }
    { "_id" : "钻石会员", "value" : 43000 }
    { "_id" : "铂金会员", "value" : 90000 }
    { "_id" : "铜牌会员", "value" : 3000 }
    { "_id" : "银牌会员", "value" : 5000 }
    { "_id" : "高级会员", "value" : 50000 }

    It seems to have achieved our effect. I just used a small amount of data 10W to do the above test. During the execution process, it will output:

    > db.mapReduceTest.mapReduce(mapFunction,reduceFunction,{out:"mr1"})
    {
        "result" : "mr1",
        "timeMillis" : 815,                   //耗时多久
        "counts" : {
            "input" : 110000,             //扫描的文档数量
            "emit" : 200000,              //mongo执行计算的次数
            "reduce" : 2001,
            "output" : 6
        },
        "ok" : 1
    }

    Because the data of my mock is relatively simple and regular, it can be seen that the number of calculations is almost twice the number of scanned documents. Later, I used random data for testing and found that the results were even worse. I decisively gave up the implementation of MapReduce and switched to other methods. accomplish.

    reply
    0
  • Cancelreply