Home  >  Q&A  >  body text

java - mongodb分片集群下,count和聚合统计问题

在mongodb分片集群下,直接用count统计会不准确,用聚合统计则可以

但是在java或mongodb客户端(非命令行)调用mongodb,使用聚合统计时,统计的结果和count同样不准确,请问大神们,我的代码如下,请大神指点,找不到原因!

@Test
public void testCount() throws Exception {
    DynamicSqlParameter dsp = new DynamicSqlParameter();
    long sT = System.currentTimeMillis();
    MongoDatasource mongoDatasource = MongoDatasource.getInstance(mongoService.getDatasource());
    DBCollection dbCollection = mongoDatasource.getDB().getCollection("dayFlow");
    List arrayList = new ArrayList<>();
    DBObject dbObject1 = new BasicDBObject();
    dbObject1.put("usedDayFlow", 2);
    DBObject dbObject2 = new BasicDBObject();
    dbObject2.put("_id", null);
    dbObject2.put("count", new BasicDBObject("$sum", 1));
    arrayList.add(new BasicDBObject("$match", dbObject1));
    arrayList.add(new BasicDBObject("$group", dbObject2));
    System.out.println(JSON.serialize(arrayList));
    AggregationOutput size = dbCollection.aggregate(arrayList);
    System.out.println(size.results());
    System.out.println("运行时间:" + ((System.currentTimeMillis() - sT) /1000) + "s");
}

执行结果:

[ { "$match" : { "usedDayFlow" : 2}} , { "$group" : { "_id" : null , "count" : { "$sum" : 1}}}]

[{ "_id" : null , "count" : 1002223}]

该统计结果比实际数据量要多一些,请教大神,对于分片集群的聚合统计要如何操作?

天蓬老师天蓬老师2712 days ago753

reply all(2)I'll reply

  • PHP中文网

    PHP中文网2017-04-18 10:53:35

    This problem has been solved. The latest driver mongo-java-driver-3.4.0 is used. Through the following method, the number of records can be accurately counted in the sharded cluster mode. Thank you for your help!

    mongo shell >> db.collection.aggregate([{$match:{categories:"Bakery"},{$group:{"_id":null,"count":{$sum:1}}}} ])

        public long getCount() {
                    String user = "用户名";
                    String database = "admin";
                    String password = "密码";
                    MongoCredential credential = MongoCredential.createCredential(user,database, password.toCharArray());
            
                    MongoClientOptions options = MongoClientOptions.builder()
                            .connectionsPerHost(10)
                            .threadsAllowedToBlockForConnectionMultiplier(10)
                            .socketTimeout(20000)
                            .connectTimeout(15000)
                            .maxWaitTime(50000)
                            .build();
            
                    MongoClient mongoClient = new MongoClient(new ServerAddress("IP地址", "端口"), Arrays.asList(credential), options);
            
                    MongoDatabase mongoDatabase = mongoClient.getDatabase("数据库");
                    MongoCollection<Document> collection = mongoDatabase.getCollection("数据表");
            
                    final long[] count = new long[1];
                    Block<Document> printBlock = new Block<Document>() {
                        @Override
                        public void apply(final Document document) {
                             count[0] = (long) document.get("count");
                        }
                    };
                    Bson bson = Filters.eq("categories", "Bakery");
                    collection.aggregate(
                            Arrays.asList(
                                    Aggregates.match(bson),
                                    Aggregates.group(null, Accumulators.sum
                                            ("count", 1L))
                            )
                    ).forEach(printBlock);
            
                    return count[0];
    }

    reply
    0
  • 阿神

    阿神2017-04-18 10:53:35

    Can you add some information in the comments? Thank you!

    Forward the content of the comment here for easy viewing:

    1. The difference between count and aggregate: In mongoDB, count and aggregate are implemented in two different programs. The implementation of aggregate takes into account the shard environment, so the official document recommends using aggregate for the shard environment. count.

    2. The results should be the same when using aggregate under the MongoDB shell and using the Java MongoDB driver to count, because both use aggregate.

    You mentioned that the Issue is probably due to the inconsistent count results between the MongoDB shell and the Java MongoDB driver.

    This inconsistency, I think it may be:

    1)比较的过程有没有纰漏;
    2)所用的Java MongoDB驱动是否有纰漏。
    

    For reference.

    Love MongoDB! Have Fun!


    Tonight at 8 o'clock, there will be an online lecture by the master of the MongoDB Chinese community. Please participate actively; this master is always on this page!

    Please click this link.

    reply
    0
  • Cancelreply