search
HomeBackend DevelopmentPHP TutorialWriting Hadoop MapReduce program using PHP and Shell_PHP Tutorial

Enables any executable program that supports standard IO (stdin, stdout) to become a hadoop mapper or reducer. For example:

Copy the code The code is as follows:

hadoop jar hadoop-streaming.jar -input SOME_INPUT_DIR_OR_FILE -output SOME_OUTPUT_DIR -mapper / bin/cat -reducer /usr/bin/wc

In this example, the cat and wc tools that come with Unix/Linux are used as mapper/reducer. Isn’t it amazing?

If you are used to using some dynamic languages, use dynamic languages ​​to write mapreduce. It is no different from previous programming. Hadoop is just a framework to run it. Let me demonstrate how to use PHP to implement mapreduce of Word Counter.

1. Find the Streaming jar

There is no hadoop-streaming.jar in the Hadoop root directory. Because streaming is a contrib, you have to find it under the contrib. Taking hadoop-0.20.2 as an example, it is here:

Copy code The code is as follows:
$HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar

2. Write Mapper

Create a new wc_mapper.php and write the following code:

Copy code The code is as follows:

#!/usr/bin/php
$in = fopen(“php://stdin”, “r”);
$results = array();
while ( $line = fgets($in, 4096) )
{
$words = preg_split('/W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach ($words as $word)
$results[] = $word;
}
fclose ($in);
foreach ($results as $key => $value)
{
print “$valuet1n”;
}

The general meaning of this code is: find the words in each line of input text and output it in the form of "
hello 1
world 1"
.

It’s basically no different from the PHP I wrote before, right? There are two things that may make you feel a little strange:

PHP as an executable program

The "#!/usr/bin/php" in the first line tells Linux to use the program /usr/bin/php as the interpreter for the following code. People who have written Linux shells should be familiar with this writing method. The first line of every shell script is like this: #!/bin/bash, #!/usr/bin/python

With this line, after saving the file, you can directly execute wc_mapper.php as cat and grep commands like this: ./wc_mapper.php

Use stdin to receive input

PHP supports multiple methods of passing in parameters. The most familiar ones should be to get the parameters passed through the Web from the $_GET, $_POST super global variables, and the second is to get the parameters passed from $_SERVER['argv'] Parameters passed in from the command line. Here, the standard input stdin

is used.

The effect of its use is:

Enter ./wc_mapper.php in the linux console

wc_mapper.php runs, and the console enters the state of waiting for user keyboard input

User enters text via keyboard

The user presses Ctrl + D to terminate the input, wc_mapper.php starts executing the real business logic and outputs the execution results

So where is stdout? Print itself is already stdout, which is no different from when we wrote web programs and CLI scripts before.

3. Write Reducer

Create a new wc_reducer.php and write the following code:

Copy the code The code is as follows:

#!/usr /bin/php
$in = fopen(“php://stdin”, “r”);
$results = array();
while ( $line = fgets($in, 4096) )
{
list($key, $value) = preg_split(“/t/”, trim($line), 2);
$results[$key] += $value;
}
fclose($in);
ksort($results);
foreach ($results as $key => $value)
{
print “$keyt$valuen”;
}

The main idea of ​​this code is to count how many times each word appears and output it in the form of "
hello 2
world 1"
.

4. Use Hadoop to run

Upload the sample text to be counted

Copy the code The code is as follows:

hadoop fs - put *.TXT /tmp/input

Execute PHP mapreduce program in Streaming mode

Copy code The code is as follows:
hadoop jar hadoop-0.20.2-streaming.jar -input /tmp/input -output /tmp /output -mapper absolute path to wc_mapper.php -reducer absolute path to wc_reducer.php

Note:

The input and output directories are paths on HDFS

The mapper and reducer are paths on the local machine. Be sure to write absolute paths, do not write relative paths, otherwise Hadoop will report an error saying that the mapreduce program cannot be found.

View results

Copy code The code is as follows:
hadoop fs -cat /tmp/output/part -00000

5. Shell version of Hadoop MapReduce program

Copy code The code is as follows:

#!/bin/bash -

# Load configuration file
source './config.sh'

# Process command line parameters
while getopts "d:" arg
do
case $arg in
d)
date=$OPTARG

?)
                                                                                                                                                                                                                              been have – echo "unkonw argument"

# The default processing date is yesterday
default_date=`date -v-1d +%Y-%m-%d`

# Final processing date. If the date format is incorrect, exit execution

date=${date:-${default_date}}
if ! [[ "$date" =~ [12][0- 9]{3}-(0[1-9]|1[12])-(0[1-9]|[12][0-9]|3[01]) ]]

then

echo "invalid date(yyyy-mm-dd): $date"
exit 1
fi

# Files to be processed
log_files=$(${hadoop_home}bin/hadoop fs -ls ${log_file_dir_in_hdfs} | awk '{print $8}' | grep $date)

# If the number of files to be processed is zero, exit execution

log_files_amount=$(($(echo $log_files | wc -l) + 0))
if [ $log_files_amount -lt 1 ]

then

echo "no log files found"
exit 0
fi

# Input file list
for f in $log_files
do

input_files_list="${input_files_list} $f"

done

function map_reduce () {
if ${hadoop_home}bin/hadoop jar ${streaming_jar_path} -input${input_files_list} -output ${mapreduce_output_dir}${date}/${1}/ -mapper "$ {mapper} ${1}" -reducer "${reducer}" -file "${mapper}"
then

echo "streaming job done!"

else
exit 1
fi
}

# Loop through each bucket
for bucket in ${bucket_list[@]}
do

map_reduce $bucket

done




http://www.bkjia.com/PHPjc/754798.html

www.bkjia.com

http: //www.bkjia.com/PHPjc/754798.htmlTechArticle enables any executable program that supports standard IO (stdin, stdout) to become a hadoop mapper or reducer. For example: Copy the code The code is as follows: hadoop jar hadoop-streaming.jar -input...
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

php怎么实现几秒后执行一个函数php怎么实现几秒后执行一个函数Apr 24, 2022 pm 01:12 PM

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php怎么除以100保留两位小数php怎么除以100保留两位小数Apr 22, 2022 pm 06:23 PM

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

php字符串有没有下标php字符串有没有下标Apr 24, 2022 am 11:49 AM

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

php怎么根据年月日判断是一年的第几天php怎么根据年月日判断是一年的第几天Apr 22, 2022 pm 05:02 PM

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php怎么读取字符串后几个字符php怎么读取字符串后几个字符Apr 22, 2022 pm 08:31 PM

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

php怎么判断有没有小数点php怎么判断有没有小数点Apr 20, 2022 pm 08:12 PM

php判断有没有小数点的方法:1、使用“strpos(数字字符串,'.')”语法,如果返回小数点在字符串中第一次出现的位置,则有小数点;2、使用“strrpos(数字字符串,'.')”语句,如果返回小数点在字符串中最后一次出现的位置,则有。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version