首页  >  文章  >  数据库  >  hadoop 配置机架感知

hadoop 配置机架感知

WBOY
WBOY原创
2016-06-07 16:31:171350浏览

周海汉?2013.7.24 http://abloz.com 假如设备链接层次分3层,第一层交换机d1下面连多个交换机rk1,rk2,rk3,rk4,. 每个交换机对应一个机架。 d1(rk1(hs11,hs12,),rk2(hs21,hs22,), rk3(hs31,hs32,),rk4(hs41,hs42,),) 可以用程序或脚本完成由host到设备的映射

周海汉?2013.7.24

http://abloz.com

假如设备链接层次分3层,第一层交换机d1下面连多个交换机rk1,rk2,rk3,rk4,…. 每个交换机对应一个机架。

d1(rk1(hs11,hs12,…),rk2(hs21,hs22,…), rk3(hs31,hs32,…),rk4(hs41,hs42,…),…)

可以用程序或脚本完成由host到设备的映射。比如,用python,生成一个topology.py:

然后在core-site.xml中配置

topology.script.file.name
/home/hadoop/hadoop-1.1.2/conf/topology.py
The script name that should be invoked to resolve DNS names to
NetworkTopology names. Example: the script would take host.foo.bar as an
argument, and return /rack1 as the output.

python机架脚本:

[hadoop@hs11 conf]$ cat topology.py
#!/usr/bin/env python

”’
This script used by hadoop to determine network/rack topology. It
should be specified in hadoop-site.xml via topology.script.file.name
Property.
topology.script.file.name
/home/hadoop/hadoop-1.1.2/conf/topology.py

To generate dict:
for i in range(xx):
#print “”hs%d”:”/rk%d/hs%d”,”%(i,(i-1)/10,i)

print “”hs%d”:”/rk%d”,”%(i,(i-1)/10)

Andy 2013.7.23
”’

import sys
from string import join

DEFAULT_RACK = ‘/rk0′;

RACK_MAP = {
“hs11″:”/rk1″,
“hs12″:”/rk1″,
“hs13″:”/rk1″,
“hs14″:”/rk1″,
“hs15″:”/rk1″,
“hs16″:”/rk1″,
“hs17″:”/rk1″,
“hs18″:”/rk1″,
“hs19″:”/rk1″,
“hs20″:”/rk1″,
“hs21″:”/rk2″,
“hs22″:”/rk2″,
“hs23″:”/rk2″,
“hs24″:”/rk2″,
“hs25″:”/rk2″,
“hs26″:”/rk2″,
“hs27″:”/rk2″,
“hs28″:”/rk2″,
“hs29″:”/rk2″,
“hs30″:”/rk2″,
“hs31″:”/rk3″,
“hs32″:”/rk3″,
“hs33″:”/rk3″,
“hs34″:”/rk3″,
“hs35″:”/rk3″,
“hs36″:”/rk3″,
“hs37″:”/rk3″,
“hs38″:”/rk3″,
“hs39″:”/rk3″,
“hs40″:”/rk3″,
“hs41″:”/rk4″,
“hs42″:”/rk4″,
“hs43″:”/rk4″,
“hs44″:”/rk4″,
“hs45″:”/rk4″,
“hs46″:”/rk4″,

“10.10.20.11”:”/rk1”,
“10.10.20.12”:”/rk1”,
“10.10.20.13”:”/rk1”,
“10.10.20.14”:”/rk1”,
“10.10.20.15”:”/rk1”,
“10.10.20.16”:”/rk1”,
“10.10.20.17”:”/rk1”,
“10.10.20.18”:”/rk1”,
“10.10.20.19”:”/rk1”,
“10.10.20.20”:”/rk1”,
“10.10.20.21”:“/rk2”,
“10.10.20.22”:”/rk2”,
“10.10.20.23”:”/rk2”,
“10.10.20.24”:”/rk2”,
“10.10.20.25”:”/rk2”,
“10.10.20.26”:”/rk2”,
“10.10.20.27”:”/rk2”,
“10.10.20.28”:”/rk2”,
“10.10.20.29”:”/rk2”,
“10.10.20.30”:”/rk2”,
“10.10.20.31”:”/rk3”,
“10.10.20.32”:”/rk3”,
“10.10.20.33”:”/rk3”,
“10.10.20.34”:”/rk3”,
“10.10.20.35”:”/rk3”,
“10.10.20.36”:”/rk3”,
“10.10.20.37”:”/rk3”,
“10.10.20.38”:”/rk3”,
“10.10.20.39”:”/rk3”,
“10.10.20.40”:”/rk3”,
“10.10.20.41”:”/rk4”,
“10.10.20.42”:”/rk4”,
“10.10.20.43”:”/rk4”,
“10.10.20.44”:”/rk4”,
“10.10.20.45”:”/rk4”,
“10.10.20.46”:”/rk4”,


}

如果 len(sys.argv)==1:
打印 DEFAULT_RACK
其他:
print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]],” “)

原来这个程序我返回的是

“hs11”:”/rk1/hs11”,

结果执行mapreduce 程序时报如下错误:

MapReduce 作业总数 = 1
启动工作 1 中的 1
由于没有reduce操作符,reduce任务的数量被设置为0
开始作业 = job_201307241502_0003,跟踪 URL = http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0003
终止命令 = /home/hadoop/hadoop-1.1.2/libexec/../bin/hadoop 作业? -kill job_201307241502_0003
Stage-1的Hadoop作业信息:映射器数量:0;减速机数量:0
2013-07-24 18:38:11,854 第一阶段地图 = 100%,?减少=100%
已结束作业 = job_201307241502_0003,有错误
作业出错,获取调试信息…
职位跟踪网址:http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0003
FAILED:执行错误,从 org.apache.hadoop.hive.ql.exec.MapRedTask
返回代码 2 MapReduce 作业启动:
工作 0:? HDFS 读取:0 HDFS 写入:0 失败
MapReduce CPU 总花费时间:0 毫秒

通过http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0002?可以看到:

作业初始化失败:

java.lang.NullPointerException

at?org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2751)
at?org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:578)
at?org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:750)

在 org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)

at?org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
在?java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
在 java.lang.Thread.run(Thread.java:662)

原来系统在配置敏感时,不需要在脚本中返回设备名或主机名,系统会自动添加。改为上面的topology.py后,系统执行正确。

相关博文:

  1. hadoop 打印配置参数
  2. hadoop 中的 ClassNotFoundException
  3. hadoop ubuntu 集群安装
声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn