Home  >  Article  >  Backend Development  >  Mongodb mapreduce usage and php sample code

Mongodb mapreduce usage and php sample code

WBOY
WBOYOriginal
2016-08-08 09:30:301246browse
Although MongoDB is not as convenient for grouping as the group by function of our commonly used relational databases such as mysql, sqlserver, oracle, MongoDB also has three ways to achieve grouping: * Mongodb's three grouping methods: * 1. group (First filter and then group, does not support sharding, limits the amount of data, and is not efficient) * 2. mapreduce (based on js engine, single-threaded execution, low efficiency, suitable for background statistics, etc.) * 3. Aggregate (recommended) (If your PHP mongodb driver version requires >=1.3.0, it is recommended that you use aggregate, which has much higher performance and is simpler to use. However, 1.3 does not currently support account authentication. mode, you can check the update log and bugs through http://pecl.php.net/package/mongo)Let’s take a look at the mapreduce method:Mongodb official website introduces MapReduce:Map/reduce in MongoDB is useful for batch processing of data and aggregation operations. It is similar in spirit to using something like Hadoop with all input coming from a collection and output going to a collection. Often, in a situation where you would have used GROUP BY in SQL, map/reduce is the right tool in MongoDB.roughly means: Map/reduce in Mongodb is mainly used for batch processing and aggregation of data, which is somewhat similar to using Hadoop to aggregate data For processing, all input data are obtained from the collection, and the data output after MapReduce will also be written to the collection. Usually similar to how we use the Group By statement in SQL. Using MapReduce requires implementing two functions: Map and Reduce. The Map function calls emit(key, value) to traverse all the records in the collection, and passes the key and value to the Reduce function for processing. Map functions and Reduce functions are written in Javascript and can perform MapReduce operations through db.runCommand or mapreduce commands.
The MapReduce command is as follows: db.runCommand( { mapreduce : <collection>, map : <mapfunction>, reduce : <reducefunction> [, query : <queryfilterobject>] [, sort : <sortthequery.usefulforoptimization>] [, limit : <numberofobjectstoreturnfromcollection>] [, out : <output-collectionname>] [, keeptemp: <true|false>] [, finalize : <finalizefunction>] [, scope : <objectwherefieldsgointojavascriptglobalscope >] [, verbose : true] } );

Parameter description:

mapreduce: the target set to be operated on

map: mapping function (generates a sequence of key-value pairs as a parameter of the Reduce function)

reduce: statistical function

query: target record filtering

sort: sort target records

limit: limit the number of target records

out: statistical result storage collection (if not specified, a temporary collection is used, which is interrupted on the client Automatically deleted after opening)

keeptemp: Whether to keep the temporary collection

finalize: Final processing function (performs final sorting on the reduce return results and stores them in the result set)

scope: Import external variables to map, reduce, and finalize

verbose : Display detailed time statistics information


map function
map function calls the current object, handles the object's attributes, and passes the value to reduce. The map method uses this to operate the current object and calls emit( ​​at least once key, value) method to provide parameters to reduce, where the key of emit is the id of the final data.
reduce函数 
接收一个值和数组,根据需要对数组进行合并分组等处理,reduce的key就是emit(key,value)的key,value_array是同个key对应的多个value数组。 
Finalize函数 
此函数为可选函数,可在执行完map和reduce后执行,对最后的数据进行统一处理。 
看完基本介绍,我们再来看一个实例:已知集合feed,测试数据如下:{ "_id": ObjectId("50ccb3f91e937e2927000004"), "feed_type": 1, "to_user": 234, "time_line": "2012-12-16 01:26:00" }{ "_id": ObjectId("50ccb3ef1e937e0727000004"), "feed_type": 8, "to_user": 123, "time_line": "2012-12-16 01:26:00" }{ "_id": ObjectId("50ccb3e31e937e0a27000003"), "feed_type": 1, "to_user": 123, "time_line": "2012-12-16 01:26:00" }{ "_id": ObjectId("50ccb3d31e937e0927000001"), "feed_type": 1, "to_user": 123, "time_line": "2012-12-16 01:26:00" }
我们按动态类型feed_type和用户to_user进行分组统计,实现结果:
feed_type to_user cout
1 234 1
8 123 1
1 123 2







实现代码://编写map函数$map = ' function() { var key = {to_user:this.to_user,feed_type:this.feed_type}; var value = {count:1}; emit(key,value); } '; //reduce 函数$reduce = ' function(key, values) { var ret = {count:0}; for(var i in values) { ret.count += 1; } return ret; }'; //查询条件$query = null; //本实例中没有查询条件,设置为null$mongo = new Mongo('mongodb://root:root@127.0.0.1: 28017/'); //链接mongodb,账号和密码为root,root$instance = $mongo->selectDB("testdb"); //执行此命令后,会创建feed_temp_res的临时集合,并将统计后的数据放在该集合中$cmd = $instance->command(array( 'mapreduce' => 'feed', 'map' => $map, 'reduce' => $reduce, 'query' => $query, 'out' => 'feed_temp_res' )); //查询临时集合中的统计数据,验证统计结果是否和预期结果一致$cursor = $instance->selectCollection('feed_temp_res')->find(); $result = array(); try { while ($cursor->hasNext()) { $result[] = $cursor->getNext(); } } catch (MongoConnectionException $e) { echo$e->getMessage(); } catch (MongoCursorTimeoutException $e) { echo$e->getMessage(); } catch(Exception$e){ echo$e->getMessage(); } //test var_dump($result);
下面是输出的结果,和预期结果一致{ "_id": { "to_user": 234, "feed_type": 1 }, "value": { "count": 1 }}{ "_id": { "to_user": 123, "feed_type": 8 }, "value": { "count": 1 }}{ "_id": { "to_user": 123, "feed_type": 1 }, "value": { "count": 2 }}
以上只是简单的统计实现,你可以实现复杂的条件统计编写复杂的reduce函数,可以增加查询条件,排序等等。附上mapReduce数据库处理函数(简单封装)/** * mapReduce分组 * * @param string $table_name 表名(要操作的目标集合名) * @param string $map 映射函数(生成键值对序列,作为 reduce 函数参数) * @param string $reduce 统计处理函数 * @param array $query 过滤条件 如:array('uid'=>123) * @param array $sort 排序 * @param number $limit 限制的目标记录数 * @param string $out 统计结果存放集合 (不指定则使用tmp_mr_res_$table_name, 1.8以上版本需指定) * @param bool $keeptemp 是否保留临时集合 * @param string $finalize 最终处理函数 (对reduce返回结果进行最终整理后存入结果集合) * @param string $scopemap、reduce、finalize 导入外部js变量 * @param bool $jsMode 是否减少执行过程中BSON和JS的转换,默认true(注:false时 BSON-->JS-->map-->BSON-->JS-->reduce-->BSON,可处理非常大的mapreduce,//true时BSON-->js-->map-->reduce-->BSON) * @param bool $verbose 是否产生更加详细的服务器日志 * @param bool $returnresult 是否返回新的结果集 * @param array &$cmdresult 返回mp命令执行结果 array("errmsg"=>"","code"=>13606,"ok"=>0) ok=1表示执行命令成功 * @return*/ function mapReduce($table_name,$map,$reduce,$query=null,$sort=null,$limit=0,$out='',$keeptemp=true,$finalize=null,$scope=null,$jsMode=true,$verbose=true,$returnresult=true,&$cmdresult){ if(empty($table_name) || empty($map) || empty($reduce)){ return null; } $map = new MongoCode($map); $reduce = new MongoCode($reduce); if(empty($out)){ $out = 'tmp_mr_res_'.$table_name; } $cmd = array( 'mapreduce' => $table_name, 'map' => $map, 'reduce' => $reduce, 'out' =>$out ); if(!empty($query) && is_array($query)){ array_push($cmd, array('query'=>$query)); } if(!empty($sort) && is_array($sort)){ array_push($cmd, array('sort'=>$query)); } if(!empty($limit) && is_int($limit) && $limit>0){ array_push($cmd, array('limit'=>$limit)); } if(!empty($keeptemp) && is_bool($keeptemp)){ array_push($cmd, array('keeptemp'=>$keeptemp)); } if(!empty($finalize)){ $finalize = new Mongocode($finalize); array_push($cmd, array('finalize'=>$finalize)); } if(!empty($scope)){ array_push($cmd, array('scope'=>$scope)); } if(!empty($jsMode) && is_bool($jsMode)){ array_push($cmd, array('jsMode'=>$jsMode)); } if(!empty($verbose) && is_bool($verbose)){ array_push($cmd, array('verbose'=>$verbose)); } $dbname = $this->curr_db_name; $cmdresult = $this->mongo->$dbname->command($cmd); if($returnresult){ if($cmdresult && $cmdresult['ok']==1){ $result = $this->find($out, array()); } } if($keeptemp==false){ //删除集合 $this->mongo->$dbname->dropCollection($out); } return$result; }
MongoDB官方网站介绍:MapReduce介绍  http://docs.mongodb.org/manual/core/map-reduce/Aggregation介绍  http://docs.mongodb.org/manual/aggregation/ 

以上就介绍了mongodb的mapreduce用法及php示例代码,包括了方面的内容,希望对PHP教程有兴趣的朋友有所帮助。

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:PHP time operationsNext article:PHP time operations