search
HomeBackend DevelopmentPHP TutorialUse of MapReduce in MongoDB

Use of MapReduce in MongoDB

Dec 08, 2017 pm 02:26 PM
mapreducemongodbuse

Friends who have played Hadoop should be familiar with MapReduce. MapReduce is powerful and flexible. It can split a large problem into multiple small problems and send each small problem to different machines for processing. All machines After all calculations are completed, the calculation results are combined into a complete solution. This is called distributed computing. In this article, we will take a look at the use of MapReduce in MongoDB.

mapReduce

MapReduce in MongoDB can be used to implement more complex aggregation commands. Using MapReduce mainly implements two functions: map function and reduce function. The map function is used to generate a sequence of key-value pairs. The result of the map function is used as a parameter of the reduce function. Further statistics are done in the reduce function. For example, my data set is as follows:

{"_id" : ObjectId("59fa71d71fd59c3b2cd908d7"),"name" : "鲁迅","book" : "呐喊","price" : 38.0,"publisher" : "人民文学出版社"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908d8"),"name" : "曹雪芹","book" : "红楼梦","price" : 22.0,"publisher" : "人民文学出版社"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908d9"),"name" : "钱钟书","book" : "宋诗选注","price" : 99.0,"publisher" : "人民文学出版社"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908da"),"name" : "钱钟书","book" : "谈艺录","price" : 66.0,"publisher" : "三联书店"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908db"),"name" : "鲁迅","book" : "彷徨","price" : 55.0,"publisher" : "花城出版社"}

If I want to query each author The total price of the books published, the operation is as follows:

var map=function(){emit(this.name,this.price)}
var reduce=function(key,value){return Array.sum(value)}
var options={out:"totalPrice"}
db.sang_books.mapReduce(map,reduce,options);
db.totalPrice.find()

emit function is mainly used to implement grouping and receives two parameters. The first parameter represents the grouping field, and the second parameter represents the data to be counted. Reduce performs specific data processing operations and receives two parameters, corresponding to the two parameters of the emit method. Here, the sum function in Array is used to perform self-processing on the price field. Options defines a set for outputting the results. At that time, we The data will be queried in this collection. By default, this collection will be retained even after the database is restarted, and the data in the collection will be retained. The query results are as follows:

{
    "_id" : "曹雪芹",
    "value" : 22.0
}
{
    "_id" : "钱钟书",
    "value" : 165.0
}
{
    "_id" : "鲁迅",
    "value" : 93.0
}

For another example, I want to query how many books each author has published, as follows:

var map=function(){emit(this.name,1)}
var reduce=function(key,value){return Array.sum(value)}
var options={out:"bookNum"}
db.sang_books.mapReduce(map,reduce,options);
db.bookNum.find()

The query results are as follows:

{
    "_id" : "曹雪芹",
    "value" : 1.0
}
{
    "_id" : "钱钟书",
    "value" : 2.0
}
{
    "_id" : "鲁迅",
    "value" : 2.0
}

Put each author’s The books are listed as follows:

var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
var options={out:"books"}
db.sang_books.mapReduce(map,reduce,options);
db.books.find()

The results are as follows:

{
    "_id" : "曹雪芹",
    "value" : "红楼梦"
}
{
    "_id" : "钱钟书",
    "value" : "宋诗选注,谈艺录"
}
{
    "_id" : "鲁迅",
    "value" : "呐喊,彷徨"
}

For example, if you query the books that each person sells for more than ¥40:

var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
var options={query:{price:{$gt:40}},out:"books"}
db.sang_books.mapReduce(map,reduce,options);
db.books.find()

query means to check the found set Filter again.

The results are as follows:

{
    "_id" : "钱钟书",
    "value" : "宋诗选注,谈艺录"
}
{
    "_id" : "鲁迅",
    "value" : "彷徨"
}

runCommand implementation

We can also use the runCommand command to execute MapReduce. The format is as follows:

db.runCommand(
               {
                 mapReduce: <collection>,
                 map: <function>,
                 reduce: <function>,
                 finalize: <function>,
                 out: <output>,
                 query: <document>,
                 sort: <document>,
                 limit: <number>,
                 scope: <document>,
                 jsMode: <boolean>,
                 verbose: <boolean>,
                 bypassDocumentValidation: <boolean>,
                 collation: <document>
               }
             )</document></boolean></boolean></boolean></document></number></document></document></output></function></function></function></collection>

The meaning is as follows:

##reduce reduce functionfinalizeFinal processing functionoutOutput Collection of ##query##sortSort the resultslimitThe number of results returnedscopeSet the parameter value, the value set here is in map , visible in the reduce and finalize functionsjsMode Whether to convert the intermediate data of map execution from javascript object to BSON object, the default is falseverboseWhether to display detailed time statisticsbypassDocumentValidationWhether to bypass document validationcollationSome other collationsThe following operation means performing a MapReduce operation and limiting the number of items returned to the statistical collection, limiting the return After the number of items, the statistical operation is performed, as follows:
var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",limit:4,verbose:true})
db.books.find()
Parameter Meaning
mapReduce Represents the collection to be operated
map map function
Filter the results
The execution results are as follows:

{
    "_id" : "曹雪芹",
    "value" : "红楼梦"
}
{
    "_id" : "钱钟书",
    "value" : "宋诗选注,谈艺录"
}
{
    "_id" : "鲁迅",
    "value" : "呐喊"
}
My friends saw that one of Lu Xun’s books was missing, because limit first limits the collection of returned items. number, and then perform statistical operations.

The finalize operation represents the final processing function, as follows:

var f1 = function(key,reduceValue){var obj={};obj.author=key;obj.books=reduceValue; return obj}
var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",finalize:f1})
db.books.find()
f1 The first parameter key represents the first parameter in emit, and the second parameter represents the execution result of reduce. We can This result is reprocessed in f1, and the result is as follows:

{
    "_id" : "曹雪芹",
    "value" : {
        "author" : "曹雪芹",
        "books" : "红楼梦"
    }
}
{
    "_id" : "钱钟书",
    "value" : {
        "author" : "钱钟书",
        "books" : "宋诗选注,谈艺录"
    }
}
{
    "_id" : "鲁迅",
    "value" : {
        "author" : "鲁迅",
        "books" : "呐喊,彷徨"
    }
}
scope can be used to define a variable that is visible in map, reduce, and finalize, as follows:

var f1 = function(key,reduceValue){var obj={};obj.author=key;obj.books=reduceValue;obj.sang=sang; return obj}
var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',--'+sang+'--,')}
db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",finalize:f1,scope:{sang:"haha"}})
db.books.find()
The execution result is as follows :

{
    "_id" : "曹雪芹",
    "value" : {
        "author" : "曹雪芹",
        "books" : "红楼梦",
        "sang" : "haha"
    }
}
{
    "_id" : "钱钟书",
    "value" : {
        "author" : "钱钟书",
        "books" : "宋诗选注,--haha--,谈艺录",
        "sang" : "haha"
    }
}
{
    "_id" : "鲁迅",
    "value" : {
        "author" : "鲁迅",
        "books" : "呐喊,--haha--,彷徨",
        "sang" : "haha"
    }
}
I hope you will gain something from this article.

Related recommendations:

MongoDB mapreduce usage and PHP sample code

How to increase MongoDB MapReduce speed by 20 times

Implementing MapReduce in Oracle database

The above is the detailed content of Use of MapReduce in MongoDB. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
PHP Performance Tuning for High Traffic WebsitesPHP Performance Tuning for High Traffic WebsitesMay 14, 2025 am 12:13 AM

ThesecrettokeepingaPHP-poweredwebsiterunningsmoothlyunderheavyloadinvolvesseveralkeystrategies:1)ImplementopcodecachingwithOPcachetoreducescriptexecutiontime,2)UsedatabasequerycachingwithRedistolessendatabaseload,3)LeverageCDNslikeCloudflareforservin

Dependency Injection in PHP: Code Examples for BeginnersDependency Injection in PHP: Code Examples for BeginnersMay 14, 2025 am 12:08 AM

You should care about DependencyInjection(DI) because it makes your code clearer and easier to maintain. 1) DI makes it more modular by decoupling classes, 2) improves the convenience of testing and code flexibility, 3) Use DI containers to manage complex dependencies, but pay attention to performance impact and circular dependencies, 4) The best practice is to rely on abstract interfaces to achieve loose coupling.

PHP Performance: is it possible to optimize the application?PHP Performance: is it possible to optimize the application?May 14, 2025 am 12:04 AM

Yes,optimizingaPHPapplicationispossibleandessential.1)ImplementcachingusingAPCutoreducedatabaseload.2)Optimizedatabaseswithindexing,efficientqueries,andconnectionpooling.3)Enhancecodewithbuilt-infunctions,avoidingglobalvariables,andusingopcodecaching

PHP Performance Optimization: The Ultimate GuidePHP Performance Optimization: The Ultimate GuideMay 14, 2025 am 12:02 AM

ThekeystrategiestosignificantlyboostPHPapplicationperformanceare:1)UseopcodecachinglikeOPcachetoreduceexecutiontime,2)Optimizedatabaseinteractionswithpreparedstatementsandproperindexing,3)ConfigurewebserverslikeNginxwithPHP-FPMforbetterperformance,4)

PHP Dependency Injection Container: A Quick StartPHP Dependency Injection Container: A Quick StartMay 13, 2025 am 12:11 AM

APHPDependencyInjectionContainerisatoolthatmanagesclassdependencies,enhancingcodemodularity,testability,andmaintainability.Itactsasacentralhubforcreatingandinjectingdependencies,thusreducingtightcouplingandeasingunittesting.

Dependency Injection vs. Service Locator in PHPDependency Injection vs. Service Locator in PHPMay 13, 2025 am 12:10 AM

Select DependencyInjection (DI) for large applications, ServiceLocator is suitable for small projects or prototypes. 1) DI improves the testability and modularity of the code through constructor injection. 2) ServiceLocator obtains services through center registration, which is convenient but may lead to an increase in code coupling.

PHP performance optimization strategies.PHP performance optimization strategies.May 13, 2025 am 12:06 AM

PHPapplicationscanbeoptimizedforspeedandefficiencyby:1)enablingopcacheinphp.ini,2)usingpreparedstatementswithPDOfordatabasequeries,3)replacingloopswitharray_filterandarray_mapfordataprocessing,4)configuringNginxasareverseproxy,5)implementingcachingwi

PHP Email Validation: Ensuring Emails Are Sent CorrectlyPHP Email Validation: Ensuring Emails Are Sent CorrectlyMay 13, 2025 am 12:06 AM

PHPemailvalidationinvolvesthreesteps:1)Formatvalidationusingregularexpressionstochecktheemailformat;2)DNSvalidationtoensurethedomainhasavalidMXrecord;3)SMTPvalidation,themostthoroughmethod,whichchecksifthemailboxexistsbyconnectingtotheSMTPserver.Impl

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),