Run spark-1.6.0 on Yarn
Run spark-1.6.0.pdf on Yarn
Directory
Directory 1
1. Agreement 1
2. Install Scala 1
2.1. Download 2
2.2. Install 2
2.3. Set environment variables 2
3. Install Spark 2
3.1. Download 2
3.2. Install 2
3.3. Configuration 3
3.3. 1. Modify conf/spark-env.sh 3
4. Start Spark 3
4.1. Run the built-in example 3
4.2.SparkSQLCli4
5. Integrate with Hive 4
6. Common errors 5
6.1. Error 1: unknownqueue:thequeue 5
6.2.SPARK_CLASSPATHwasdetected6
7. Related Document 6
1. Agreement
This article agrees that Hadoop2.7.1 is installed in /data/hadoop/current, and Spark1.6.0 is installed in /data/hadoop/spark , where /data/hadoop/spark points to /data/hadoop/spark.
Spark’s official website is: http://spark.apache.org/ (Shark’s official website is: http://shark.cs.berkeley.edu/. Shark has become a module of Spark and no longer needs to be used separately. Install).
Run Spark in cluster mode and do not introduce client mode.
2. Install Scala
Martin Odersky of the Ecole Polytechnique Fédérale de Lausanne (EPFL) started designing Scala in 2001 based on the work of Funnel.
Scala is a multi-paradigm programming language, designed to integrate various features of pure object-oriented programming and functional programming. It runs on the Java virtual machine JVM, is compatible with existing Java programs, and can call Java class libraries. Scala includes a compiler and class libraries and is released under the BSD license.
2.1. Download
Spark is developed using Scala. Before installing Spark, install Scala in each section. Scala’s official website is: http://www.scala-lang.org/, and the download URL is: http://www.scala-lang.org/download/. This article downloads the binary installation package scala-2.11.7. tgz.
2.2. Installation
This article uses the root user (actually it can also be a non-root user, it is recommended to plan in advance) to install Scala in /data/scala, where /data/scala points to /data Soft link to /scala-2.11.7.
The installation method is very simple, upload scala-2.11.7.tgz to the /data directory, and then decompress scala-2.11.7.tgz in the /data/ directory.
Next, create a soft link: ln-s/data/scala-2.11.7/data/scala.
2.3. Set environment variables
After Scala is installed, you need to add it to the PATH environment variable. You can directly modify the /etc/profile file and add the following content:
exportSCALA_HOME=/data/scala
exportPATH=$SCALA_HOME/bin:$PATH
| exportSCALA_HOME=/data/scala exportPATH=$SCALA_HOME/bin:$PATH
|
3. Install Spark
Spark is installed as a non-root user. This article installs it as the hadoop user.
3.1. Download the binary installation package downloaded in this article
. This method is recommended, otherwise you will have to worry about compilation. The download URL is: http://spark.apache.org/downloads.html. This article downloads spark-1.6.0-bin-hadoop2.6.tgz, which can be run directly on YARN.
3.2. Installation
1) Upload spark-1.6.0-bin-hadoop2.6.tgz to the directory /data/hadoop
2) Unzip: tarxzfspark -1.6.0-bin-hadoop2.6.tgz
3) Create a soft link: ln-sspark-1.6.0-bin-hadoop2.6spark
in To run spark on yarn, you do not need to install spark on every machine. You can install it on only one machine. But spark can only be run on the machine where it is installed. The reason is simple: the file that calls spark is needed.
3.3. Configuration
3.3.1. Modify conf/spark-env.sh
HADOOP_CONF_DIR=/data/hadoop/current/etc/hadoop
YARN_CONF_DIR=/data/hadoop/current/etc/hadoop
|
You can make a copy of spark-env.sh.template, and then add the following content:
HADOOP_CONF_DIR=/data/hadoop/current/etc/hadoop
YARN_CONF_DIR=/data/hadoop/current/etc/hadoop
|
./bin/spark-submit--classorg.apache.spark.examples.SparkPi
--masteryarn--deploy-modecluster
--driver-memory4g
--executor-memory2g
--executor-cores1
--queuedefault
lib/spark-examples*.jar10
|
4. Start Spark Since it runs on Yarn, there is no process of starting Spark. Instead, when the command spark-submit is executed, Spark is scheduled to run by Yarn. 4.1. Run the built-in example
./bin/spark-submit--classorg.apache.spark.examples.SparkPi --masteryarn--deploy-modecluster --driver-memory4g --executor-memory2g --executor-cores1 --queuedefault lib/spark -examples*.jar10 |
运行输出:
16/02/0316:08:33INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:34INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:35INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:36INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:37INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:38INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:39INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING)
16/02/0316:08:40INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:FINISHED)
16/02/0316:08:40INFOyarn.Client:
clienttoken:N/A
diagnostics:N/A
ApplicationMasterhost:10.225.168.251
ApplicationMasterRPCport:0
queue:default
starttime:1454486904755
finalstatus:SUCCEEDED
trackingURL:http://hadoop-168-254:8088/proxy/application_1454466109748_0007/
user:hadoop
16/02/0316:08:40INFOutil.ShutdownHookManager:Shutdownhookcalled
16/02/0316:08:40INFOutil.ShutdownHookManager:Deletingdirectory/tmp/spark-7fc8538c-8f4c-4d8d-8731-64f5c54c5eac
| 16/02/0316:08:33INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) 16/02/0316:08:34INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) 16/02/0316:08:35INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) ./bin/spark-sql--masteryarn | 16/02/0316:08:36INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) 16/02/0316:08:37INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) 16/02/0316:08:38INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) 16/02/0316:08:39INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:RUNNING) 16/02/0316:08:40INFOyarn.Client:Applicationreportforapplication_1454466109748_0007(state:FINISHED) 16/02/0316:08:40INFOyarn.Client: clienttoken:N/A diagnostics:N/A ApplicationMasterhost:10.225.168.251 ApplicationMasterRPCport:0 queue:default starttime:1454486904755 finalstatus:SUCCEEDED trackingURL:http://hadoop-168-254:8088/proxy/application_1454466109748_0007/ user:hadoop 16/02/0316:08:40INFOutil.ShutdownHookManager:Shutdownhookcalled 16/02/0316:08:40INFOutil.ShutdownHookManager:Deletingdirectory/tmp/spark-7fc8538c-8f4c-4d8d-8731-64f5c54c5eac |
4.2.SparkSQLCli通过运行即可进入SparkSQLCli交互界面,但要在Yarn上以cluster运行,则需要指定参数--master值为yarn(注意不支持参数--deploy-mode的值为cluster,也就是只能以client模式运行在Yarn上):
./bin/spark-sql--masteryarn |
Why can SparkSQLCli only run in client mode? In fact, it is easy to understand. Since it is interactive and you need to see the output, the cluster mode cannot do it at this time. Because of the cluster mode, the machine on which ApplicationMaster runs is dynamically determined by Yarn.
5. Integrate with Hive
Spark integrating Hive is very simple, just the following steps:
1) Add HIVE_HOME to spark-env.sh, such as: exportHIVE_HOME =/data/hadoop/hive
2) Copy Hive’s hive-site.xml and hive-log4j.properties files to Spark’s conf directory.
After completion, execute spark-sql again to enter Spark's SQLCli, and run the command showtables to see the tables created in Hive.
Example:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin. jar
6. Common Errors
6.1. Error 1: unknownqueue:thequeue
Run:
./bin/spark-submit--classorg. apache.spark.examples.SparkPi--masteryarn--deploy-modecluster--driver-memory4g--executor-memory2g--executor-cores1--queuethequeuelib/spark-examples*.jar10
reports the following error, Just change "--queuethequeue" to "--queuedefault".
16/02/0315:57:36INFOyarn.Client:Applicationreportforapplication_1454466109748_0004(state:FAILED)
16/02/0315:57:36INFOyarn.Client:
clienttoken:N/A
diagnostics:Applicationapplication_1454466109748_0004submittedbyuserhadooptounknownqueue:thequeue
ApplicationMasterhost:N/A
ApplicationMasterRPCport:-1
queue:thequeue
starttime:1454486255907
finalstatus:FAILED
trackingURL:http://hadoop-168-254:8088/proxy/application_1454466109748_0004/
user:hadoop
16/02/0315:57:36INFOyarn.Client:Deletingstagingdirectory.sparkStaging/application_1454466109748_0004
Exceptioninthread"main"org.apache.spark.SparkException:Applicationapplication_1454466109748_0004finishedwithfailedstatus
atorg.apache.spark.deploy.yarn.Client.run(Client.scala:1029)
atorg.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
atorg.apache.spark.deploy.yarn.Client.main(Client.scala)
atsun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
atjava.lang.reflect.Method.invoke(Method.java:606)
atorg.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
atorg.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
atorg.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
atorg.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
atorg.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/02/0315:57:36INFOutil.ShutdownHookManager:Shutdownhookcalled
16/02/0315:57:36INFOutil.ShutdownHookManager:Deletingdirectory/tmp/spark-54531ae3-4d02-41be-8b9e-92f4b0f05807
| 16/02/0315:57:36INFOyarn.Client:Applicationreportforapplication_1454466109748_0004(state:FAILED) 16/02/0315:57:36INFOyarn.Client: clienttoken:N /A diagnostics:Applicationapplication_1454466109748_0004submittedbyuserhadooptounknownqueue:thequeue ApplicationMasterhost:N/A ApplicationMasterRPCport:-1 queue:thequeue starttime :1454486255907 finalstatus:FAILED trackingURL:http://hadoop-168-254:8088/proxy/application_1454466109748_0004/ user:hadoop 16/02/0315:57:36INFOyarn.Client:Deletingstagingdirectory.sparkStaging/application_1454466109748_0004 Exceptioninthread"main"org.apache.spark.SparkException:Applicationapplication_1454466109748_0004finishedwithfailed status aorg.apache.spark.deploy. yarn.Client.run(Client.scala:1029) aorg.apache.spark.deploy.yarn.Client$.main(Client.scala:1076) aorg.apache.spark .deploy.yarn.Client.main(Client.scala) atsun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod) atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) atjava.lang.reflect.Method.invoke(Method.java:606) atorg.apache .spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) aorg.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala :181) aorg.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) aorg.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala :121) aorg.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/02/0315:57:36INFOutil.ShutdownHookManager:Shutdownhookcalled 16/02/0315:57:36INFOutil.ShutdownHookManager:Deletingdirectory/tmp/spark-54531ae3-4d02-41be-8b9e-92f4b0f05807 |
6.2.SPARK_CLASSPATHwasdetected
SPARK_CLASSPATHwasdetected(setto'/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar:').
ThisisdeprecatedinSpark1. 0 .
Pleaseinsteaduse:
-./spark-submitwith--driver-class-pathtoaugmentthedriverclasspath
-spark.executor.extraClassPathtoaugmenttheexecutorclasspath
means no It is recommended to set the environment variable SPARK_CLASSPATH in spark-env.sh, which can be changed to the following recommended method:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib /mysql-connector-java-5.1.38-bin.jar
7. Related documents
"HBase-0.98.0 Distributed Installation Guide"
"Hive0. 12.0 Installation Guide"
"ZooKeeper-3.4.6 Distributed Installation Guide"
"Hadoop2.3.0 Source Code Reverse Engineering"
"Compiling Hadoop-2.4 on Linux .0》
《Accumulo-1.5.1 Installation Guide》
《Drill1.0.0 Installation Guide》
《Shark0.9.1 Installation Guide》
For more, please pay attention to the technology blog: http://aquester.culog.cn.
http://www.bkjia.com/PHPjc/1103191.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1103191.htmlTechArticleRunning spark-1.6.0 on Yarn Running spark-1.6.0.pdf on Yarn Table of contents 1 1. Convention 1 2. Install Scala 1 2.1. Download 2 2.2. Install 2 2.3. Set environment variables 2 3. Install Spark 2 3.1. Download 2 3.2. Install...