Home  >  Article  >  Backend Development  >  Writing MapReduce jobs using Python

Writing MapReduce jobs using Python

高洛峰
高洛峰Original
2016-10-18 10:28:521373browse

mrjob allows you to write MapReduce jobs in Python 2.5+ and run them on multiple different platforms. You can:

Write multi-step MapReduce jobs using pure Python

Test on your local machine

On a Hadoop cluster Run

Use Amazon Elastic MapReduce (EMR) to run on the cloud

The installation method of pip is very simple, no configuration is required, run directly: pip install mrjob

Code example:

from mrjob.job import MRJob
class MRWordCounter(MRJob):
    def mapper(self, key, line):
        for word in line.split():
            yield word, 1
    def reducer(self, word, occurrences):
        yield word, sum(occurrences)
if __name__ == '__main__':
    MRWordCounter.run()


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn