suchen

最近要做一次关于yarn的分享,于是想搭建一个Hadoop环境。Hadoop 2.0较之前的Hadoop 0.1x变化比较大,折腾了好久了,终于把环境搞好了。我搭建了一个两节点的集群,只配置了一些必须的参数,让集群勉强跑起来。 1、core-site.xml configurationpropertynamef

最近要做一次关于yarn的分享,于是想搭建一个Hadoop环境。Hadoop 2.0较之前的Hadoop 0.1x变化比较大,折腾了好久了,终于把环境搞好了。我搭建了一个两节点的集群,只配置了一些必须的参数,让集群勉强跑起来。

1、core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.232.42.91:19000/</value>
</property>
</configuration>

2、mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

3、yarn-site.xml

<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>hdfs://10.232.42.91:19001/</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdfs://10.232.42.91:19002/</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>10.232.42.91:8030</value>
</property>
</configuration>

把JAVA_HOME、HADOOP_HOME都设置到.bashrc里面去,然后运行sbin/start-all.sh。使用jps可以看到两个节点下运行的进程如下。

[master] jps
31318 ResourceManager
28981 DataNode
11580 JobHistoryServer
28858 NameNode
29155 SecondaryNameNode
31426 NodeManager
11016 Jps
[slave] jps
12592 NodeManager
11711 DataNode
17699 Jps

上面这个JobHistoryServer需要单独启动,通过它可以看到每个application的详细日志。启动命令如下。

sbin/mr-jobhistory-daemon.sh start historyserver

打开http://10.232.42.91:8088/cluster/cluster这个地址可以看到cluster的介绍信息。这里再也看不到slot相关的数据了。

Snip20130307_49

万事俱备。放点文本数据到hdfs://10.232.42.91:19000/input这个目录下,运行wordcount看看效果。

$ cd hadoop/share/hadoop/mapreduce
$ hadoop jar hadoop-mapreduce-examples-2.0.3-alpha.jar wordcount hdfs://10.232.42.91:19000/input hdfs://10.232.42.91:19000/output
13/03/07 21:08:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/03/07 21:08:26 INFO input.FileInputFormat: Total input paths to process : 3
13/03/07 21:08:26 INFO mapreduce.JobSubmitter: number of splits:3
13/03/07 21:08:26 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/03/07 21:08:26 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/03/07 21:08:26 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/03/07 21:08:26 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/03/07 21:08:26 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/03/07 21:08:26 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/03/07 21:08:26 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/03/07 21:08:26 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/03/07 21:08:26 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/03/07 21:08:26 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/03/07 21:08:26 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/03/07 21:08:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1362658309553_0019
13/03/07 21:08:26 INFO client.YarnClientImpl: Submitted application application_1362658309553_0019 to ResourceManager at /10.232.42.91:19001
13/03/07 21:08:26 INFO mapreduce.Job: The url to track the job: http://search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0019/
13/03/07 21:08:26 INFO mapreduce.Job: Running job: job_1362658309553_0019
13/03/07 21:08:33 INFO mapreduce.Job: Job job_1362658309553_0019 running in uber mode : false
13/03/07 21:08:33 INFO mapreduce.Job:  map 0% reduce 0%
13/03/07 21:08:39 INFO mapreduce.Job:  map 100% reduce 0%
13/03/07 21:08:44 INFO mapreduce.Job:  map 100% reduce 100%
13/03/07 21:08:44 INFO mapreduce.Job: Job job_1362658309553_0019 completed successfully
13/03/07 21:08:44 INFO mapreduce.Job: Counters: 43
	File System Counters
		FILE: Number of bytes read=12698
		FILE: Number of bytes written=312593
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=16947
		HDFS: Number of bytes written=8739
		HDFS: Number of read operations=12
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=3
		Launched reduce tasks=1
		Rack-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=10750
		Total time spent by all reduces in occupied slots (ms)=4221
	Map-Reduce Framework
		Map input records=317
		Map output records=2324
		Map output bytes=24586
		Map output materialized bytes=12710
		Input split bytes=316
		Combine input records=2324
		Combine output records=885
		Reduce input groups=828
		Reduce shuffle bytes=12710
		Reduce input records=885
		Reduce output records=828
		Spilled Records=1770
		Shuffled Maps =3
		Failed Shuffles=0
		Merged Map outputs=3
		GC time elapsed (ms)=376
		CPU time spent (ms)=4480
		Physical memory (bytes) snapshot=557428736
		Virtual memory (bytes) snapshot=2105122816
		Total committed heap usage (bytes)=254607360
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=16631
	File Output Format Counters 
		Bytes Written=8739

接下来玩玩yarn吧。Hadoop官方文档那篇WritingYarnApplications太让人蛋碎了,好在我领悟到distributedshell就是使用yarn编写的。要研究yarn的话,直接去Hadoop source里面找相应的代码研究即可。

$ hadoop jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar --jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --shell_command uname --shell_args '-a'
13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/03/07 21:42:44 INFO distributedshell.Client: Initializing Client
13/03/07 21:42:44 INFO distributedshell.Client: Running Client
13/03/07 21:42:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/03/07 21:42:44 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=2
13/03/07 21:42:44 INFO distributedshell.Client: Got Cluster node info from ASM
13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search042091.sqa.cm4:39557, nodeAddresssearch042091.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663711950, 
13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search041134.sqa.cm4:49313, nodeAddresssearch041134.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663712038, 
13/03/07 21:42:44 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=17, queueChildQueueCount=0
13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS
13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE
13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
13/03/07 21:42:44 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 1024
13/03/07 21:42:44 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 8192
13/03/07 21:42:44 INFO distributedshell.Client: AM memory specified below min threshold of cluster. Using min value., specified=10, min=1024
13/03/07 21:42:44 INFO distributedshell.Client: Setting up application submission context for ASM
13/03/07 21:42:44 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment
13/03/07 21:42:45 INFO distributedshell.Client: Set the environment for the application master
13/03/07 21:42:45 INFO distributedshell.Client: Setting up app master command
13/03/07 21:42:45 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx1024m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 1 --priority 0 --shell_command uname --shell_args -a --debug 1><log_dir>/AppMaster.stdout 2><log_dir>/AppMaster.stderr 
13/03/07 21:42:45 INFO distributedshell.Client: Submitting application to ASM
13/03/07 21:42:45 INFO client.YarnClientImpl: Submitted application application_1362658309553_0020 to ResourceManager at /10.232.42.91:19001
13/03/07 21:42:46 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:47 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:48 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:49 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:50 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:51 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:52 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao
13/03/07 21:42:52 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop
13/03/07 21:42:52 INFO distributedshell.Client: Application completed successfully
</log_dir></log_dir>

运行完成之后,找不到输出在哪儿,费了好大的劲,终于在hadoop/logs/userlogs下面找到输出了。不知道为何运行了两个container。

$ tree hadoop/logs/userlogs/application_1362658309553_0018
application_1362658309553_0018
|-- container_1362658309553_0018_01_000001
|   |-- AppMaster.stderr
|   `-- AppMaster.stdout
`-- container_1362658309553_0018_01_000002
    |-- stderr
    `-- stdout
$ cat hadoop/logs/userlogs/application_1362658309553_0018/container_1362658309553_0018_01_000002/stdout
Linux search042091.sqa.cm4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

好,开始用yarn调度一个程序。我写了一个脚本,里面启动了服务器。

$ cat ~/start_sp.sh 
#!/bin/env bash
source /home/admin/.bashrc
/home/admin/sp/bin/sap_server -c /home/admin/sp/sp_worker/etc/sap_server_app.cfg -l /home/admin/sp/sp_worker/etc/sap_server_log.cfg -k restart

启动起来之后,进程关系图如下。

Snip20130308_58

接着我把脚本直接kill掉,期待yarn给我重启脚本。发现application运行结束了,AppMaster.stderr日志里面有如下内容。

13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1362747551045_0017_01_000002, state=COMPLETE, exitStatus=137, diagnostics=
Killed by external signal
13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Current application state: loop=464, appDone=true, total=1, requested=1, completed=1, failed=1, currentAllocated=1
13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM
13/03/08 21:40:02 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.AMRMClientImpl is stopped.
13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application Master failed. exiting
Stellungnahme
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn
Java错误:Hadoop错误,如何处理和避免Java错误:Hadoop错误,如何处理和避免Jun 24, 2023 pm 01:06 PM

Java错误:Hadoop错误,如何处理和避免当使用Hadoop处理大数据时,常常会遇到一些Java异常错误,这些错误可能会影响任务的执行,导致数据处理失败。本文将介绍一些常见的Hadoop错误,并提供处理和避免这些错误的方法。Java.lang.OutOfMemoryErrorOutOfMemoryError是Java虚拟机内存不足的错误。当Hadoop任

如何在Nginx配置Cookie安全策略如何在Nginx配置Cookie安全策略Jun 10, 2023 pm 12:54 PM

随着互联网的不断发展和普及,Web应用程序已成为人们日常生活中必不可少的一部分,这也决定了Web应用程序的安全问题非常重要。在Web应用程序中,Cookie被广泛使用来实现用户身份认证等功能,然而Cookie也存在着安全风险,因此在配置Nginx时,必须设定适当的Cookie安全策略,以保证Cookie的安全性。下面是一些在Nginx中配置Cookie安全策

MySQL连接池的最大连接数如何设置?MySQL连接池的最大连接数如何设置?Jun 30, 2023 pm 12:55 PM

如何配置MySQL连接池的最大连接数?MySQL是一个开源的关系型数据库管理系统,被广泛应用于各种领域的数据存储与管理。在使用MySQL时,我们常常需要使用连接池来管理数据库连接,以提高性能和资源利用率。连接池是一种维护和管理数据库连接的技术,它能够在需要时提供数据库连接,并在不需要时回收连接,从而减少了连接的重复创建和销毁。而连接池的最大连接数则是连接池所

Nginx错误页面配置,优雅处理网站故障Nginx错误页面配置,优雅处理网站故障Jul 04, 2023 pm 04:06 PM

Nginx错误页面配置,优雅处理网站故障在现代互联网时代,一个高度稳定和可靠的网站是任何企业或个人追求的目标。然而,由于各种原因,网站可能会经历故障或错误,这可能是由于网络问题、服务器问题或应用程序错误等。为了提供更好的用户体验和优雅地处理任何可能发生的错误,Nginx作为一个强大的Web服务器软件,不仅能够提供高性能的服务,还能够灵活地配置错误页面。在Ng

如何通过宝塔面板进行UFW防火墙的配置如何通过宝塔面板进行UFW防火墙的配置Jun 21, 2023 am 09:08 AM

在Linux服务器上配置防火墙非常重要,它可以有效地保护服务器免受恶意攻击。在Ubuntu操作系统上,我们可以使用UFW防火墙来保护服务器的安全。在本文中,我们将介绍如何使用宝塔面板配置UFW防火墙。第一步:安装宝塔面板首先,我们需要在Ubuntu上安装宝塔面板。您可以在宝塔官网免费下载宝塔面板的安装包,然后在命令行中运行以下命令来安装宝塔面板:$wget

在Beego中使用Hadoop和HBase进行大数据存储和查询在Beego中使用Hadoop和HBase进行大数据存储和查询Jun 22, 2023 am 10:21 AM

随着大数据时代的到来,数据处理和存储变得越来越重要,如何高效地管理和分析大量的数据也成为企业面临的挑战。Hadoop和HBase作为Apache基金会的两个项目,为大数据存储和分析提供了一种解决方案。本文将介绍如何在Beego中使用Hadoop和HBase进行大数据存储和查询。一、Hadoop和HBase简介Hadoop是一个开源的分布式存储和计算系统,它可

如何使用Linux进行虚拟网络配置如何使用Linux进行虚拟网络配置Jun 18, 2023 am 11:24 AM

随着云计算、大数据和物联网等技术的日益普及,虚拟化技术成为了当今IT领域的热门话题。虚拟化是通过将一台物理主机划分为多个独立的虚拟机,实现资源的共享和管理的方法。虚拟网络是虚拟化的其中一个重要组成部分,能够满足不同应用之间的网络隔离和互动需求。在本文中,我们将介绍如何使用Linux进行虚拟网络配置。一、Linux虚拟网络的概述在物理网络中,网卡是连接网络设备

Intel TXT的安装和配置步骤Intel TXT的安装和配置步骤Jun 11, 2023 pm 06:49 PM

IntelTXT(TrustedExecutionTechnology,可信执行技术)是一种硬件帮助保护系统安全的技术。它通过使用硬件测量模块(TPM)来确保系统启动过程中的完整性,并且可以防止恶意软件攻击。在本文中,我们将讨论IntelTXT的安装和配置步骤,帮助你更好地保护你的系统安全。第一步:检查硬件要求安装IntelTXT前,需要先检查计算

See all articles

Heiße KI -Werkzeuge

Undresser.AI Undress

Undresser.AI Undress

KI-gestützte App zum Erstellen realistischer Aktfotos

AI Clothes Remover

AI Clothes Remover

Online-KI-Tool zum Entfernen von Kleidung aus Fotos.

Undress AI Tool

Undress AI Tool

Ausziehbilder kostenlos

Clothoff.io

Clothoff.io

KI-Kleiderentferner

AI Hentai Generator

AI Hentai Generator

Erstellen Sie kostenlos Ai Hentai.

Heißer Artikel

Heiße Werkzeuge

SublimeText3 chinesische Version

SublimeText3 chinesische Version

Chinesische Version, sehr einfach zu bedienen

SublimeText3 Mac-Version

SublimeText3 Mac-Version

Codebearbeitungssoftware auf Gottesniveau (SublimeText3)

MantisBT

MantisBT

Mantis ist ein einfach zu implementierendes webbasiertes Tool zur Fehlerverfolgung, das die Fehlerverfolgung von Produkten unterstützen soll. Es erfordert PHP, MySQL und einen Webserver. Schauen Sie sich unsere Demo- und Hosting-Services an.

Dreamweaver CS6

Dreamweaver CS6

Visuelle Webentwicklungstools

DVWA

DVWA

Damn Vulnerable Web App (DVWA) ist eine PHP/MySQL-Webanwendung, die sehr anfällig ist. Seine Hauptziele bestehen darin, Sicherheitsexperten dabei zu helfen, ihre Fähigkeiten und Tools in einem rechtlichen Umfeld zu testen, Webentwicklern dabei zu helfen, den Prozess der Sicherung von Webanwendungen besser zu verstehen, und Lehrern/Schülern dabei zu helfen, in einer Unterrichtsumgebung Webanwendungen zu lehren/lernen Sicherheit. Das Ziel von DVWA besteht darin, einige der häufigsten Web-Schwachstellen über eine einfache und unkomplizierte Benutzeroberfläche mit unterschiedlichen Schwierigkeitsgraden zu üben. Bitte beachten Sie, dass diese Software