MongoDB Map Reduce
Map-Reduce is a computing model. Simply put, it decomposes a large batch of work (data) for execution (MAP), and then merges the results into the final result (REDUCE).
The Map-Reduce provided by MongoDB is very flexible and quite practical for large-scale data analysis.
MapReduce command
The following is the basic syntax of MapReduce:
>db.collection.mapReduce( function() {emit(key,value);}, //map 函数 function(key,values) {return reduceFunction}, //reduce 函数 { out: collection, query: document, sort: document, limit: number } )
Use MapReduce to implement two functions: Map function and Reduce function, Map function calls emit(key, value), Traverse all the records in the collection and pass the key and value to the Reduce function for processing.
The Map function must call emit(key, value) to return the key-value pair.
Parameter description:
map: Mapping function (generates a sequence of key-value pairs as a reduce function parameter).
reduce Statistical function, the task of the reduce function is to turn key-values into key-value, that is, to turn the values array into a single value value. .
out Statistical result storage collection (if not specified, a temporary collection will be used, which will be automatically deleted after the client is disconnected).
query A filter condition, only documents that meet the condition will call the map function. (query.limit, sort can be combined at will)
sort The sort parameter combined with limit (also sorts the documents before sending to the map function), can Optimize the grouping mechanism
limit The upper limit of the number of documents sent to the map function (if there is no limit, using sort alone is of little use)
Using MapReduce
Consider the following document structure to store the user's articles. The document stores the user's user_name and the status field of the article:
>db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "mark", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "php", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "php", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "php中文网,最全的技术文档。", "user_name": "php", "status":"active" }) WriteResult({ "nInserted" : 1 })
Now, we will Use the mapReduce function in the posts collection to select published articles (status: "active"), group them by user_name, and calculate the number of articles for each user:
>db.posts.mapReduce( function() { emit(this.user_name,1); }, function(key, values) {return Array.sum(values)}, { query:{status:"active"}, out:"post_total" } )
The above mapReduce output result is:
{ "result" : "post_total", "timeMillis" : 23, "counts" : { "input" : 5, "emit" : 5, "reduce" : 1, "output" : 2 }, "ok" : 1 }
The results show that there are 4 documents that meet the query conditions (status: "active"). Four key-value pair documents are generated in the map function, and finally the reduce function is used to divide the same key values into two groups.
Specific parameter description:
result: the name of the collection that stores the results. This is a temporary collection. After the MapReduce connection is closed It will be deleted automatically.
timeMillis: The time it takes to execute, in milliseconds
input: The number of documents that meet the conditions and are sent to the map function
emit: The number of times emit is called in the map function, which is the total amount of data in all collections
ouput: The number of documents in the result collection Number(count is very helpful for debugging)
ok: Whether it is successful or not, success is 1
-
err: If it fails, there can be a reason for the failure, but from experience, the reason is vague and of little use
Use the find operator to view the mapReduce query results:
>db.posts.mapReduce( function() { emit(this.user_name,1); }, function(key, values) {return Array.sum(values)}, { query:{status:"active"}, out:"post_total" } ).find()
The above query displays the following results. Two users tom and mark have two published articles:
{ "_id" : "mark", "value" : 4 } { "_id" : "php", "value" : 1 }
In a similar way, MapReduce can be used to build large and complex aggregation queries.
Map function and Reduce function can be implemented using JavaScript, making the use of MapReduce very flexible and powerful.