MongoDB aggregation
Aggregation in MongoDB is mainly used to process data (such as statistical averages, sums, etc.) and return calculated data results. Somewhat similar to count(*) in sql statement.
aggregate() method
The aggregation method in MongoDB uses aggregate().
Syntax
The basic syntax format of the aggregate() method is as follows:
>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Instance
The data in the collection is as follows:
{ _id: ObjectId(7df78ad8902c) title: 'MongoDB Overview', description: 'MongoDB is no sql database', by_user: 'w3cschool.cc', url: 'http://www.w3cschool.cc', tags: ['mongodb', 'database', 'NoSQL'], likes: 100 }, { _id: ObjectId(7df78ad8902d) title: 'NoSQL Overview', description: 'No sql database is very fast', by_user: 'w3cschool.cc', url: 'http://www.w3cschool.cc', tags: ['mongodb', 'database', 'NoSQL'], likes: 10 }, { _id: ObjectId(7df78ad8902e) title: 'Neo4j Overview', description: 'Neo4j is no sql database', by_user: 'Neo4j', url: 'http://www.neo4j.com', tags: ['neo4j', 'database', 'NoSQL'], likes: 750 },
Now we calculate the number of articles written by each author through the above collection, and use aggregate() to calculate the results as follows:
> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}]) { "result" : [ { "_id" : "w3cschool.cc", "num_tutorial" : 2 }, { "_id" : "Neo4j", "num_tutorial" : 1 } ], "ok" : 1 } >
The above example is similar to the sql statement: select by_user, count(*) from mycol group by by_user
In the above example, we group the data by the field by_user field and calculate the sum of the same values of the by_user field.
The following table shows some aggregate expressions:
Expression | Description | Instance |
---|---|---|
$sum | Calculate the sum. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}]) |
$avg | Calculate average | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}} }]) |
$min | Get the minimum value corresponding to all documents in the collection. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}]) |
$max | Get the maximum value corresponding to all documents in the collection. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}]) |
$push | Insert values into an array in the resulting document. | db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}]) |
$addToSet | Inserts values into an array in the resulting document, but does not create a copy. | db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}]) |
$first | Get the first document data according to the sorting of resource documents. | db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}]) |
$last | Get the last document data according to the sorting of resource documents | db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}]) |
The concept of pipes
Pipes are generally used in Unix and Linux to use the output of the current command as the parameters of the next command.
MongoDB's aggregation pipeline passes the results of MongoDB documents to the next pipeline for processing after one pipeline is completed. Pipeline operations are repeatable.
Expression: Process the input document and output it. The expression is stateless and can only be used to calculate the documents of the current aggregation pipeline and cannot process other documents.
Here we introduce several operations commonly used in the aggregation framework:
$project: Modify the structure of the input document. It can be used to rename, add or delete fields, and can also be used to create calculation results and nested documents.
$match: used to filter data and only output documents that meet the conditions. $match uses MongoDB's standard query operations.
$limit: used to limit the number of documents returned by the MongoDB aggregation pipeline.
$skip: Skip the specified number of documents in the aggregation pipeline and return the remaining documents.
$unwind: Split an array type field in the document into multiple strips, each containing a value in the array.
$group: Group documents in the collection and can be used for statistical results.
$sort: Sort the input documents and output them.
$geoNear: Outputs ordered documents close to a certain geographical location.
Pipeline operator instance
1, $project instance
db.article.aggregate( { $project : { title : 1 , author : 1 , }} );
In this case, only the result will be There are three fields: _id, tilt and author. By default, the _id field is included. If you want not to include _id, you can do this:
db.article.aggregate( { $project : { _id : 0 , title : 1 , author : 1 }});
2.$match instance
db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] );
$match is used to obtain records with scores greater than 70 and less than or equal to 90, and then sends the records that meet the conditions to the next stage $group pipeline operator for processing.
3.$skip instance
db.article.aggregate( { $skip : 5 });
After being processed by the $skip pipeline operator, the first five documents are "filtered" out.