Home >Backend Development >Python Tutorial >Example of python implementing mapreduce mode

Example of python implementing mapreduce mode

高洛峰
高洛峰Original
2016-11-21 14:45:071588browse

MapReduce is a pattern borrowed from functional programming languages. In some scenarios, it can greatly simplify the code. Let’s first take a look at what MapReduce is:

MapReduce is a software architecture proposed by Google for parallel operations on large-scale data sets (larger than 1TB). The concepts "Map" and "Reduce", and their main ideas, are borrowed from functional programming languages, as well as features borrowed from vector programming languages.
The current software implementation specifies a Map (mapping) function to map a set of key-value pairs into a new set of key-value pairs, and specifies a concurrent Reduce (induction) function to ensure that all mapped key-value pairs are aligned Each of them share the same set of keys.
Simply put, MapReduce decomposes the problem to be processed into two parts: Map and Reduce. The data to be processed is treated as a sequence, and the data in each sequence is calculated through the Map function, and then aggregated into the final result through the Reduce function.

The following uses mapreduce mode to implement a simple program that counts the number of occurrences of words in the log:

from functools import reduce
from multiprocessing import Pool
from collections import Counter

def read_inputs(file):
    for line in file:
        line = line.strip()
        yield line.split()

def count(file_name):
    file = open(file_name)
    lines = read_inputs(file)
    c = Counter()
    for words in lines:
        for word in words:
            c[word] += 1
    return c

def do_task():
    job_list = ['log.txt'] * 10000
    pool = Pool(8)
    return reduce(lambda x, y: x+y, pool.map(count, job_list))

if __name__ == "__main__":
    rv = do_task()


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn