Home  >  Q&A  >  body text

mongodb实现exists去重

db.getCollection('user').save({"uid":"1123", "logTime":ISODate('2016-05-23 11:11:23')});
db.getCollection('user').save({"uid":"1124", "logTime":ISODate('2016-05-23 05:11:23')});
db.getCollection('user').save({"uid":"1125", "logTime":ISODate('2016-05-23 08:11:23')});
db.getCollection('user').save({"uid":"1126", "logTime":ISODate('2016-05-23 09:12:23')});
db.getCollection('user').save({"uid":"1127", "logTime":ISODate('2016-05-23 11:23:23')});
db.getCollection('user').save({"uid":"1134", "logTime":ISODate('2016-05-23 11:11:23')});
db.getCollection('user').save({"uid":"1123", "logTime":ISODate('2016-05-21 11:11:23')});
db.getCollection('user').save({"uid":"1125", "logTime":ISODate('2016-05-22 11:11:23')});
db.getCollection('user').save({"uid":"2343", "logTime":ISODate('2016-04-23 11:11:23')});
db.getCollection('user').save({"uid":"9873", "logTime":ISODate('2016-04-23 11:11:23')});
db.getCollection('user').save({"uid":"4321", "logTime":ISODate('2016-04-20 11:11:23')});

上面模拟一些数据 生产一个user集合来表示用户登录状况. 现在我想查询2016-05-23的新用户数(5-23号登录的用户但是在小于5-23号的记录中不存在) 比如第一条记录uid='1123'在5-21也登录过 那么uid='1123'就不算在5-23登录的用户数中

天蓬老师天蓬老师2727 days ago762

reply all(2)I'll reply

  • PHP中文网

    PHP中文网2017-05-02 09:21:34

    mongodb does not support join, the following method can achieve your purpose

    var list = [];
    db.getCollection('user').find({'logTime': {'$lt': ISODate('2016-05-23')}}, {'_id': 0, 'uid': 1}).forEach(function(item){
        list.push(item['uid']);
    });
    
    db.getCollection('user').find({'logTime': {'$gte': ISODate('2016-05-23')}, 'uid': {'$nin': list}});
    

    reply
    0
  • 淡淡烟草味

    淡淡烟草味2017-05-02 09:21:34

    The user who logged in from 5-23 does not exist in the records less than 5-23

    First of all, this design has obvious flaws.

    1. Performance defect: There are too many records less than the 23rd. 1 year, 2 years, and 10 years ago are all considered less than May 23rd. How far forward do you have to scan? It may be fine if there are only a few days of data in the beginning, but as the number of logs increases, your program will run harder and harder. And the process will go quickly. At this time, you will think of deleting the old logs, and the following problems will arise.

    2. Difficult to maintain: Login logs are large in volume and not very useful to ordinary people, and few people keep them permanently. It is conceivable that when you delete the logs from a year ago, a group of new users will be born, because the logs of their first login may have been deleted by you.

    In addition, aren’t new users generally referred to as newly registered users? Which day counts as the registration day? Is there any special reason why this algorithm must be used to find new users?
    If you must count the first login day as new user time, then it is recommended to redesign it. For example, based on the user table, find users who have not logged in, and then regularly search their login logs. Once found, record the first login date into the user table. In this way, in theory, no matter how long it takes, the number of users who have registered but never logged in will not change much (it is not that there will be no change, there are indeed users who have registered but never logged in, but they are few after all), your program Performance will not deteriorate over time.

    I don’t answer your question directly, because my usual experience tells me to solve the problem fundamentally and don’t patch it based on a mistake, which will only make the situation worse and worse.

    PS: Is there a problem with the logic of your last sentence?

    reply
    0
  • Cancelreply