Run spark-1.6.0 on Yarn

Directory
Directory 1
1. Agreement 1
2. Install Scala 1
2.1. Download 2
2.2. Install 2
2.3. Set environment variables 2
3. Install Spark 2
3.1. Download 2
3.2. Install 2
3.3. Configuration 3
3.3. 1. Modify conf/spark-env.sh 3
4. Start Spark 3
4.1. Run the built-in example 3
4.2.SparkSQLCli4
5. Integrate with Hive 4
6. Common errors 5
6.1. Error 1: unknownqueue:thequeue 5
6.2.SPARK_CLASSPATHwasdetected6
7. Related Document 6
1. Agreement
This article agrees that Hadoop2.7.1 is installed in /data/hadoop/current, and Spark1.6.0 is installed in /data/hadoop/spark , where /data/hadoop/spark points to /data/hadoop/spark.
Spark’s official website is: http://spark.apache.org/ (Shark’s official website is: http://shark.cs.berkeley.edu/. Shark has become a module of Spark and no longer needs to be used separately. Install).
Run Spark in cluster mode and do not introduce client mode.
2. Install Scala
Martin Odersky of the Ecole Polytechnique Fédérale de Lausanne (EPFL) started designing Scala in 2001 based on the work of Funnel.
Scala is a multi-paradigm programming language, designed to integrate various features of pure object-oriented programming and functional programming. It runs on the Java virtual machine JVM, is compatible with existing Java programs, and can call Java class libraries. Scala includes a compiler and class libraries and is released under the BSD license.
2.1. Download
Spark is developed using Scala. Before installing Spark, install Scala in each section. Scala’s official website is: http://www.scala-lang.org/, and the download URL is: http://www.scala-lang.org/download/. This article downloads the binary installation package scala-2.11.7. tgz.
2.2. Installation
This article uses the root user (actually it can also be a non-root user, it is recommended to plan in advance) to install Scala in /data/scala, where /data/scala points to /data Soft link to /scala-2.11.7.
The installation method is very simple, upload scala-2.11.7.tgz to the /data directory, and then decompress scala-2.11.7.tgz in the /data/ directory.
Next, create a soft link: ln-s/data/scala-2.11.7/data/scala.
2.3. Set environment variables
After Scala is installed, you need to add it to the PATH environment variable. You can directly modify the /etc/profile file and add the following content:
|
HADOOP_CONF_DIR=/data/hadoop/current/etc/hadoop YARN_CONF_DIR=/data/hadoop/current/etc/hadoop |
You can make a copy of spark-env.sh.template, and then add the following content:
HADOOP_CONF_DIR=/data/hadoop/current/etc/hadoopYARN_CONF_DIR=/data/hadoop/current/etc/hadoop |
./bin/spark-submit--classorg.apache.spark.examples.SparkPi --masteryarn--deploy-modecluster --driver-memory4g --executor-memory2g --executor-cores1 --queuedefault lib/spark-examples*.jar10 |
./bin/spark-submit--classorg.apache.spark.examples.SparkPi --masteryarn--deploy-modecluster --driver-memory4g --executor-memory2g --executor-cores1 --queuedefault lib/spark -examples*.jar10 |
运行输出:
|
./bin/spark-sql--masteryarn |
Why can SparkSQLCli only run in client mode? In fact, it is easy to understand. Since it is interactive and you need to see the output, the cluster mode cannot do it at this time. Because of the cluster mode, the machine on which ApplicationMaster runs is dynamically determined by Yarn.
5. Integrate with Hive
Spark integrating Hive is very simple, just the following steps:
1) Add HIVE_HOME to spark-env.sh, such as: exportHIVE_HOME =/data/hadoop/hive
2) Copy Hive’s hive-site.xml and hive-log4j.properties files to Spark’s conf directory.
After completion, execute spark-sql again to enter Spark's SQLCli, and run the command showtables to see the tables created in Hive.
Example:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin. jar
6. Common Errors
6.1. Error 1: unknownqueue:thequeue
Run:
./bin/spark-submit--classorg. apache.spark.examples.SparkPi--masteryarn--deploy-modecluster--driver-memory4g--executor-memory2g--executor-cores1--queuethequeuelib/spark-examples*.jar10
reports the following error, Just change "--queuethequeue" to "--queuedefault".
|
6.2.SPARK_CLASSPATHwasdetected
SPARK_CLASSPATHwasdetected(setto'/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar:').
ThisisdeprecatedinSpark1. 0 .
Pleaseinsteaduse:
-./spark-submitwith--driver-class-pathtoaugmentthedriverclasspath
-spark.executor.extraClassPathtoaugmenttheexecutorclasspath
means no It is recommended to set the environment variable SPARK_CLASSPATH in spark-env.sh, which can be changed to the following recommended method:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib /mysql-connector-java-5.1.38-bin.jar
7. Related documents
"HBase-0.98.0 Distributed Installation Guide"
"Hive0. 12.0 Installation Guide"
"ZooKeeper-3.4.6 Distributed Installation Guide"
"Hadoop2.3.0 Source Code Reverse Engineering"
"Compiling Hadoop-2.4 on Linux .0》
《Accumulo-1.5.1 Installation Guide》
《Drill1.0.0 Installation Guide》
《Shark0.9.1 Installation Guide》
For more, please pay attention to the technology blog: http://aquester.culog.cn.

ThesecrettokeepingaPHP-poweredwebsiterunningsmoothlyunderheavyloadinvolvesseveralkeystrategies:1)ImplementopcodecachingwithOPcachetoreducescriptexecutiontime,2)UsedatabasequerycachingwithRedistolessendatabaseload,3)LeverageCDNslikeCloudflareforservin

You should care about DependencyInjection(DI) because it makes your code clearer and easier to maintain. 1) DI makes it more modular by decoupling classes, 2) improves the convenience of testing and code flexibility, 3) Use DI containers to manage complex dependencies, but pay attention to performance impact and circular dependencies, 4) The best practice is to rely on abstract interfaces to achieve loose coupling.

Yes,optimizingaPHPapplicationispossibleandessential.1)ImplementcachingusingAPCutoreducedatabaseload.2)Optimizedatabaseswithindexing,efficientqueries,andconnectionpooling.3)Enhancecodewithbuilt-infunctions,avoidingglobalvariables,andusingopcodecaching

ThekeystrategiestosignificantlyboostPHPapplicationperformanceare:1)UseopcodecachinglikeOPcachetoreduceexecutiontime,2)Optimizedatabaseinteractionswithpreparedstatementsandproperindexing,3)ConfigurewebserverslikeNginxwithPHP-FPMforbetterperformance,4)

APHPDependencyInjectionContainerisatoolthatmanagesclassdependencies,enhancingcodemodularity,testability,andmaintainability.Itactsasacentralhubforcreatingandinjectingdependencies,thusreducingtightcouplingandeasingunittesting.

Select DependencyInjection (DI) for large applications, ServiceLocator is suitable for small projects or prototypes. 1) DI improves the testability and modularity of the code through constructor injection. 2) ServiceLocator obtains services through center registration, which is convenient but may lead to an increase in code coupling.

PHPapplicationscanbeoptimizedforspeedandefficiencyby:1)enablingopcacheinphp.ini,2)usingpreparedstatementswithPDOfordatabasequeries,3)replacingloopswitharray_filterandarray_mapfordataprocessing,4)configuringNginxasareverseproxy,5)implementingcachingwi

PHPemailvalidationinvolvesthreesteps:1)Formatvalidationusingregularexpressionstochecktheemailformat;2)DNSvalidationtoensurethedomainhasavalidMXrecord;3)SMTPvalidation,themostthoroughmethod,whichchecksifthemailboxexistsbyconnectingtotheSMTPserver.Impl


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver CS6
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

WebStorm Mac version
Useful JavaScript development tools
