Home >Backend Development >PHP Tutorial >Hive 1.2.1&Spark&Sqoop Installation Guide_PHP Tutorial
Directory 1
1. Preface 1
2. Agreement 2
3. Service port 2
4. Install MySQL 2
4.1. Install MySQL 2
4.2. Create Hive metadata database 4
5. Installation steps 4
5.1. Download Hive1.2.1 binary installation package 4
5.2. Install Hive 5
5.3. Install MySQL-Connector 5
5.4. Modify configuration 5
5.4.1. Modify /etc/profile or ~/.profile 5
5.4 .2. Modify other configuration files 5
5.4.2.1. Modify hive-env.sh 6
5.4.2.2. Modify hive-site.xml 6
5.4. 2.3. Modify hive-log4j.properties 7
5.4.2.4. Modify hive-exec-log4j.properties 7
6. Start and run 7
7. Remotely execute HSQL 8
8. Basic commands 8
9. Single point solution 9
10. Integrate with Spark 9
11. Integrate with Sqoop 9
11.1. Modify sqoop-env.sh 9
11.2. Modify sqoop-site.xml 10
11.3. Verification test 10
12. Common mistakes 10
13. Related documents 14
The installation of this article refers to the "Hive0.12.0 Installation Guide", and the content comes from the official: GettingStarted, install Hive1.2.1 on Hadoop2.7.1. This article configures Hive into Server mode, uses MySQL as the metadata database, and remotely connects to MySQL.
For the installation of Hadoop2.7.1, please refer to the article "Hadoop-2.7.1 Distributed Installation Manual".
This article agrees that Hadoop is installed in /data/hadoop/current, and Hive1.2.1 is installed in the directory /data/hadoop/hive (actually pointing to /data/hadoop /hive-1.2.1-bin). Install MySQL5.7.10 to the directory /data/mysql. During actual installation and deployment, you can specify other directories.
|
hive.server2.thrift.port, it will be started when hiveserver2 is executed | ||||
9083 | hive.metastore.uris, it will be started when hive--servicemetastore is executed |
Since a single MySQL has a single point of problem, it needs to be configured as an active-standby MySQL mode in practice.
MySQL in this article is installed on the 172.25.39.166 machine. Hive uses MySQL to store metadata, so MySQL needs to be installed first. The latest MySQL5.7.10 is installed here. The download URL is: http://dev.mysql.com/downloads/mysql/. This article chooses "Linux-Generic(glibc2.5)(" under "Linux-Generic". x86,64-bit),CompressedTARArchive", its binary installation package is named mysql-5.7.10-linux-glibc2.5-x86_64.tar.gz.
After decompressing the binary installation package, you can see a file named INSTALL-BINARY. This file explains how to install MySQL. This article basically refers to it.
Since the officially provided binary installation package, the "--prefix" specified during compilation is "/usr/local/mysql", so it is best to install mysql in the /usr/local directory, otherwise the installation The process may easily encounter problems. But create the data directory specified as a directory under a large enough partition.
Of course, the data directory can also be a soft link to a large enough partition directory, and the soft link method is recommended. Otherwise, when using the mysql command, you often need to specify the parameter "--datadir", such as mysqld, mysqld_safe and mysql_ssl_rsa_setup, etc. all need to specify "--datadir".
If it is not installed in /usr/local/mysql, you need to specify --basedir, --character-sets-dir, --language, --lc-messages-dir, --plugin- for mysqld. dir and many other parameter values.
If you cannot install it as root user, you also need to specify --slow-query-log-file, --socket, --pid-file, --plugin-dir and --general-log- for mysqld. file and other parameter values.
The default values of these parameters can be viewed by executing MySQL's "bin/mysqld --verbose--help".
|
Create database user hive:
|
grantallonhive.*to'hive'@'localhost'identifiedby'hive2016'; grantallonhive.*to'hive'@'172.25.39.166'identifiedby'hive2016'; grantallonhive.*to'hive'@'172.25.40.171'identifiedby'hive2016'; |
Enter hive database:
1) Native entry: mysql-uhive-phive2016
Note that if MySQL master-master synchronization or other synchronization is configured, if the synchronized library does not contain mysql, create a library and users need to operate once on different MySQL.
5.1. Download the Hive1.2.1 binary installation package
Download URL: http://hive.apache.org/downloads.html, after downloading The package name is: apache-hive-1.2.1-bin.tar.gz, and then upload apache-hive-1.2.1-bin.tar.gz to the /data directory.
5.2. Install Hive
1) Switch to the /data directory: cd/data
3) Change the name: mvapache-hive-1.2.1-binhive-1.2.1
4) Create a soft link: ln-shive-1.2.1hive
5.3. Install MySQL-Connector
There is mysql-connector-java-5.1.38-bin.jar in the compressed package "mysql-connector-java-5.1.38.tar.gz". After decompression, mysql-connector-java-5.1 .38-bin.jar is uploaded to the lib directory of Hive. This is the JDBC driver for MySQL.
exportHIVE_HOME=/data/hadoop/hive exportPATH=$HIVE_HOME/bin:$PATH |
hadoop@VM-40-171-sles10-64:~/hive/conf>ls hive-default.xml.templatehive-exec-log4j.properties.template hive-env.sh.templatehive-log4j.properties.template |
exportHIVE_HOME=/data/hadoop/hive exportPATH=$HIVE_HOME/bin:$PATH |
cphive-env.sh.templatehive-env.sh cphive-default.xml.templatehive-site.xml cphive-log4j.properties.templatehive-log4j.properties cphive-exec-log4j.properties.templatehive-exec-log4j.properties |
hadoop@VM-40-171-sles10-64:~/hive/ conf>ls
|
cphive-env.sh.templatehive-env .shcphive-default.xml.templatehive-site.xmlcphive-log4j.properties.templatehive-log4j.propertiescphive-exec-log4j.properties. templatehive-exec-log4j.properties |
HADOOP_HOME=/data/hadoop/current |
1) Modify javax.jdo.option.ConnectionURL
and set the value to:
jdbc:mysql: //172.25.39.166:3306/hive?useSSL=false,
note "useSSL=false", and other parameters characterEncoding=UTF-8, etc.
2) Modify javax.jdo.option.ConnectionDriverName
and set the value to: com.mysql.jdbc.Driver.
3) Modify javax.jdo.option.ConnectionUserName
and set the value to the user name hive for accessing the hive database:
4) Modify javax.jdo.option.ConnectionPassword
and set the value to the password for accessing the hive database:
5) Modify hive.metastore.schema.verification
Modify this value according to the situation.
6) Modify hive.zookeeper.quorum
and set the value to: 10.12.154.77, 10.12.154.78, 10.12.154.79. ZooKeeper is installed on these three machines. It is recommended to use the machine name instead of IP, because machine retirement may cause IP changes.
7) Modify hive.metastore.uris
and set the value to: thrift://172.25.40.171:9083, 9083 is the RPC service port of Hive metadata.
8) Modify hive.metastore.warehouse.dir
and set the value to: /data/hadoop/hive/warehouse. Note that before starting, you need to create the directory (mkdir/data/ hadoop/hive/warehouse).
9) Modify hive.server2.thrift.bind.host
This value defaults to localhost. If you need to access Hive remotely from other machines, you need to change it to an IP address. This article will It is changed to 172.25.40.171, which can be considered as 0.0.0.0.
10) Modify hive.exec.scratchdir
This step is optional, you can directly use the default value /tmp/hive. Set to: /data/hadoop/hive/tmp or other, and create the directory.
11) Modify hive.exec.local.scratchdir
and set it to: /data/hadoop/hive/tmp/scratch or others, and create the directory.
12) Modify hive.downloaded.resources.dir
and set it to: /data/hadoop/hive/tmp/resources or others, and create the directory.
13) Modify hive.querylog.location
to: /data/hadoop/hive/tmp/querylog or others, and create a directory.
14) Modify hive.server2.logging.operation.log.location
to: /data/hadoop/hive/tmp/operation or others, and create a directory.
Modify the log file storage directory and change the log directory from /tmp/${user.name} to /data/hadoop/hive/ logs:
|
hive.log.dir=/data/hadoop/hive/logs/exec |
hive.log.dir=/data/hadoop/hive/logs/exec |
Then create the directory /data/hadoop/hive/logs/exec.
1) Initialize metastore
After installation and configuration, before starting the Hive server, you need to execute "schematool-dbTypemysql-initSchema" on the server ” to complete the initialization of the metastore.
If MySQL master-master synchronization is configured, it only needs to be executed on one hive machine. Repeated execution will result in an error.
2) Start metastore
Execute command: hive--servicemetastore&
3) Start Hive service
Execute: hiveserver2&.
4) Enter the Hive command line operation interface (similar to mysql)
Execute: hive
In addition to using the hive command line operation interface, hiveserver2 also provides beeline (hive is the username, hive2016 is the password, you can get more information from HiveServer2 Clients):
SLF4J:Foundbindingin[jar:file:/data/hadoop/hive-1.2.1-bin/lib/slf4j -log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J:Seehttp://www.slf4j.org/codes.html#multiple_bindingsforanexplanation.
|
CREATETABLEpokes(fooINT,barSTRING);CREATETABLEinvites (fooINT,barSTRING)PARTITIONEDBY(dsSTRING);SHOWTABLES;SHOWTABLES'.*s';DESCRIBEinvites;DROPTABLEpokes; |
LOADDATALOCALINPATH'../examples/files/kv2.txt'OVERWRITEINTOTABLEinvitesPARTITION(ds='2014 '); |
You can check the loading status through "select*frominvites;" or execute "selectcount(1)frominvites;".
The single point can be solved by deploying two hives. The metadatabase uses MySQL. MySQL and hive are deployed on the same machine. The two MySQLs are configured as master-master synchronization. .
Hive adopts one master and one hot backup method. It is best to ensure that only one hive provides services at the same time. Although in many cases, both hives can provide services and work normally.
Spark integrating Hive is very simple, just the following steps:
1) Add HIVE_HOME to spark-env.sh, such as: exportHIVE_HOME =/data/hadoop/hive
2) Copy Hive’s hive-site.xml and hive-log4j.properties files to Spark’s conf directory.
After completion, execute spark-sql again to enter Spark's SQLCli, and run the command showtables to see the tables created in Hive.
Example:
./spark-sql--masteryarn--driver-class-path/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin. jar
Taking sqoop-1.4.6.bin__hadoop-2.0.4-alpha as an example, it supports incremental import, which can not only import data into Hive, but also You can import data to HBase or import data from DB to HDFS storage. In short, Sqoop is very powerful, but it is only briefly introduced here.
Download sqoop-1.4.6.bin__hadoop-2.0.4-alpha from Sqoop’s official website (download URL: http://www.apache.org/dyn/closer.lua/sqoop/1.4.6) .tar.gz.
Unzip, then enter the Sqoop conf directory and complete the following modifications:
Copy a copy of sqoop-env-template. sh, named sqoop-env.sh. Set the following environment variables in sqoop-env.sh:
1) HADOOP_COMMON_HOME
value is the Hadoop installation directory, example: exportHADOOP_COMMON_HOME=/data/hadoop
2) The HADOOP_MAPRED_HOME
value is the directory where the hadoop-common-*.tar file is located, which is located under the Hadoop installation directory.
Example: exportHADOOP_MAPRED_HOME=/data/hadoop/share/hadoop/common
3)HBASE_HOME
The value is the installation directory of HBase, example: exportHBASE_HOME=/data/hbase
4)HIVE_HOME
value is the installation directory of Hive, example: exportHIVE_HOME=/data/hive
5)ZOOCFGDIR
value is the configuration of Zookeeper Directory, example: exportZOOCFGDIR=/data/zookeeper/conf
Copy a copy of sqoop-site-template.xml and name it sqoop-site.xml , no modification is required.
1) List MySQL database
./sqooplist-databases--connectjdbc:mysql://127.0.0.1:3306/--usernamezhangsan--passwordzhangsan2016 |
./sqoopcreate-hive-table--connectjdbc:mysql://127.0.0.1:3306/test--usernamezhangsan--passwordzhangsan2016--tablet_test--hive-tablet_test_2016 |
./sqoopcreate-hive-table --connectjdbc:mysql://127.0.0.1:3306/test--usernamezhangsan--passwordzhangsan2016--tablet_test--hive-tablet_test_2016 |
If the Hive table needs to be partitioned, you can specify it through the parameters --hive-partition-key and --hive-partition-value.
If you need to overwrite an existing Hive table, just add the parameter "--hive-overwrite". The "--hive-partition-key" value is the partition name, which defaults to string type, and "--hive-partition-value" is the partition value.
./sqoopimport--connectjdbc:mysql://127.0.0.1:3306/test--usernamezhangsan--password'zhangsan2016'--tablet_test--hive-import-m6--hive-tablet_test_2016--direct |
./sqoopimport--connectjdbc:mysql://127.0.0.1:3306/test--usernamezhangsan--password 'zhangsan2016'--tablet_test--hive-import-m6--hive-tablet_test_2016--direct |
It is recommended to bring the parameter "--direct", which means to use the fast mode. For example, it will use the MySQL tool mysqldump to export data.
"-m" indicates how many maps are enabled to import data in parallel. The default is 4. It is best not to set the number higher than the maximum number of maps in the cluster.
"--table" is used to specify the name of the DB table to be imported, and "--hive-import" means to import data from DB to Hive. You can also use the parameter "--query" to conditionally export from the DB using SQL.
If you need to specify the character set, use the parameter "--default-character-set", such as: --default-character-setUTF-8.
1) TIMESTAMPwithimplicitDEFAULTvalueisdeprecated
Error when executing "bin/mysqld--initialize--user=mysql" of MySQL.
The reason is that starting from MySQL version 5.6, the default value of timestamp has been marked as deprecated, that is, if the field of type timestamp is not explicitly declared as NULL, the default value is NOTNULL. If the timestamp field is set to NULL, the current timestamp is automatically stored.
2)Can'tfinderror-messagefile'/usr/local/mysql/share/errmsg.sys'
Execute MySQL's "bin/mysqld--initialize--user=mysql-- explicit_defaults_for_timestamp" error.
This may be because the data directory is not empty because it has been executed before. You can see that the default data directory is /var/lib/mysql through "bin/mysqld--verbose--help|grepdatadir" /. It is necessary to ensure that the /var/lib/mysql/ directory is empty. Or change the data directory by specifying the parameter --datadir, such as "bin/mysqld --initialize --user=mysql --explicit_defaults_for_timestamp --datadir=/data/mysql/data".
3)Can'tfinderror-messagefile'/usr/local/mysql/share/errmsg.sys'
For error:
Can'tfinderror-messagefile'/usr /local/mysql/share/errmsg.sys'.Checkerror-messagefilelocationand'lc-messages-dir'configurationdirective.
The default installation directory of MySQL downloaded from the official website is /usr/local/mysql, if it is actually other directory, it is recommended to specify it through the parameter --basedir, otherwise you will encounter many installation problems. By executing "bin/mysqld --verbose --help|grepbasedir" you can see that the default value of "--basedir" is /usr/local/mysql/.
4) FailedtoconnecttotheMetaStoreServer
If you run hiveserver2 and encounter the following errors, it is recommended to turn on the DEBUG log level to view more detailed information and change the log configuration file hive-log4j.properties Change "hive.root.logger=WARN,DRFA" to "hive.root.logger=DEBUG,WARN,DRFA".
|
After modification, run hiveserver2 again. The log becomes more detailed. It is guessed that the metastore is not up. You can start the metastore by executing "hive --servicemetastore".
|
SLF4J:Actualbindingisoftype[org.slf4j.impl.Log4jLoggerFactory]MetaException(message:Versioninformationnotfoundinmetastore.)aorg.apache.hadoop.hive.metastore.ObjectStore.checkSchema( ObjectStore.java:5638)aorg.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:5622)atsun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)atjava.lang.reflect. Method.invoke(Method.java:483)aorg.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)atcom.sun.proxy.$Proxy2 .verifySchema(UnknownSource)aorg.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:403)aorg.apache.hadoop.hive.metastore.HiveMetaStore $HMSHandler.createDefaultDB(HiveMetaStore.java:441)aorg.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)aorg.apache.hadoop .hive.metastore.HiveMetaStore$HMSHandler. |
6)java.net.URISyntaxException:RelativepathinabsoluteURI:${system:java.io.tmpdir}/${system:user.name}
Solution: Replace all hive-site.xml system:java.io.tmpdir are all replaced with absolute paths, hive-1.2.1 has 4 locations in total.
7)EstablishingSSLconnectionwithoutserver'sidentityverificationisnotrecommended
Problem:
WedFeb1710:39:37CST2016WARN:EstablishingSSLconnectionwithoutserver'sidentityverificationisnotrecommended.AccordingtoMySQL5.5.45,5. 6.26 and5.7.6 requirementsSSLconnectionmustbeestablishedbydefaultifexplicitoptionisn'tset. ForcompliancewithexistingapplicationsnotusingSSLtheverifyServerCertificatepropertyissetto'false'.YouneedeithertoexplicitlydisableSSLbysettinguseSSL=false,orsetuseSSL=trueandprovidetruststoreforservercertificateverification.
The solution is that the configuration item javax.jdo.option.ConnectionURL value in hive-site.xml needs to be added with "useSSL =false", such as:
jdbc:mysql://127.0.0.1:3306/hive?characterEncoding=UTF-8;useSSL=false.
8)SPARK_CLASSPATHwasdetected
SPARK_CLASSPATHwasdetected(setto'/data/hadoop/hive/lib/mysql-connector-java-5.1.38-bin.jar:').
ThisisdeprecatedinSpark1.0 .
Pleaseinsteaduse:
-./spark-submitwith--driver-class-pathtoaugmentthedriverclasspath
-spark.executor.extraClassPathtoaugmenttheexecutorclasspath
This means that it is not recommended to set the environment variable SPARK_CLASSPATH in spark-env.sh. You can change it to the following recommended method:
./spark-sql--masteryarn--driver-class-path/data/hadoop/ hive/lib/mysql-connector-java-5.1.38-bin.jar
"HBase-0.98.0 Distributed Installation Guide"
"Hive1.2.1 Installation Guide"
"ZooKeeper-3.4.6 Distributed Installation Guide"
"Hadoop2.3.0 Source Code Reverse Engineering"
"Compiling on Linux Hadoop-2.7.1》
《Accumulo-1.5.1 Installation Guide》
《Drill1.0.0 Installation Guide》
《Shark0.9.1 Installation Guide》
For more, please pay attention to the technology blog: http://aquester.cublog.cn.