Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:33 PM

hadoophdfspracticebuildTable of contentscluster

目录结构 Hadoop集群(CDH4)实践之 (0) 前言 Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建 Hadoop集群(CDH4)实践之 (2) HBaseZookeeper搭建 Hadoop集群(CDH4)实践之 (3) Hive搭建 Hadoop集群(CHD4)实践之 (4) Oozie搭建 Hadoop集群(CHD4)实践之 (5) Sqoop安

目录结构
Hadoop集群(CDH4)实践之 (0) 前言
Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建
Hadoop集群(CDH4)实践之 (2) HBase&Zookeeper搭建
Hadoop集群(CDH4)实践之 (3) Hive搭建
Hadoop集群(CHD4)实践之 (4) Oozie搭建
Hadoop集群(CHD4)实践之 (5) Sqoop安装

本文内容
Hadoop集群(CDH4)实践之 (1) Hadoop(HDFS)搭建

参考资料
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/CDH4-Installation-Guide.html

环境准备
OS: CentOS 6.4 x86_64
Servers:
hadoop-master: 172.17.20.230 内存10G
- namenode

hadoop- secondarynamenode: 172.17.20.234 内存10G
- secondarybackupnamenode,jobtracker

hadoop-node-1: 172.17.20.231 内存10G
- datanode,tasktracker

hadoop-node-2: 172.17.20.232 内存10G
- datanode,tasktracker

hadoop-node-3: 172.17.20.233 内存10G
- datanode,tasktracker

对以上角色做一些简单的介绍：
namenode - 整个HDFS的命名空间管理服务
secondarynamenode - 可以看做是namenode的冗余服务
jobtracker - 并行计算的job管理服务
datanode - HDFS的节点服务
tasktracker - 并行计算的job执行服务

本文定义的规范，避免在配置多台服务器上产生理解上的混乱：
所有直接以 $ 开头，没有跟随主机名的命令，都代表需要在所有的服务器上执行，除非后面有单独的//开头或在标题说明。

1. 选择最好的安装包
为了更方便和更规范的部署Hadoop集群，我们采用Cloudera的集成包。
因为Cloudera对Hadoop相关的系统做了很多优化，避免了很多因各个系统间版本不符产生的很多Bug。
这也是很多资深Hadoop管理员所推荐的。
https://ccp.cloudera.com/display/DOC/Documentation/

2. 安装Java环境
由于整个Hadoop项目主要是通过Java开发完成的，因此需要JVM的支持。
登陆www.oracle.com（需要创建一个ID），从以下地址下载一个64位的JDK，如jdk-7u45-linux-x64.rpm

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

$ sudo rpm -ivh jdk-7u45-linux-x64.rpm
$ sudo vim /etc/profile.d/java.sh

 
export JAVA_HOME=/usr/java/jdk1.7.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

$ sudo chmod +x /etc/profile.d/java.sh
$ source /etc/profile

3. 配置Hadoop安装源
$ sudo rpm --import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
$ cd /etc/yum.repos.d/
$ sudo wget http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo

4. 安装Hadoop相关套件，选择MRv1的框架支持
$ sudo yum install hadoop-hdfs-namenode //仅在hadoop-master上安装

$ sudo yum install hadoop-hdfs-secondarynamenode //仅在hadoop-secondary上安装
$ sudo yum install hadoop-0.20-mapreduce-jobtracker //仅在hadoop-secondary上安装

$ sudo yum install hadoop-hdfs-datanode //仅在hadoop-node上安装
$ sudo yum install hadoop-0.20-mapreduce-tasktracker //仅在hadoop-node上安装

$ sudo yum install hadoop-client

5. 创建Hadoop配置文件
$ sudo cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster

6. 激活新的配置文件
$ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
$ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
$ cd /etc/hadoop/conf

7. 添加hosts记录并修改对应的主机名
$ sudo vim /etc/hosts

 
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.17.20.230 hadoop-master
172.17.20.234 hadoop-secondary
172.17.20.231 hadoop-node-1
172.17.20.232 hadoop-node-2
172.17.20.233 hadoop-node-3

8. 安装LZO支持
$ cd /etc/yum.repos.d
$ sudo wget http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/cloudera-gplextras4.repo
$ sudo yum install hadoop-lzo-cdh4

9. 配置hadoop/conf下的文件
$ sudo vim /etc/hadoop/conf/masters

 
hadoop-master

$ sudo vim /etc/hadoop/conf/slaves

 
hadoop-node-1
hadoop-node-2
hadoop-node-3

10. 创建hadoop的HDFS目录
$ sudo mkdir -p /data/{1,2,3,4}/mapred/local
$ sudo chown -R mapred:hadoop /data/{1,2,3,4}/mapred/local

$ sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/ns /data/{1,2,3,4}/dfs/dn
$ sudo chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/ns /data/{1,2,3,4}/dfs/dn
$ sudo chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/ns /data/{1,2,3,4}/dfs/dn

$ sudo mkdir /data/tmp
$ sudo chmod 1777 /data/tmp

11. 配置core-site.xml
$ sudo vim /etc/hadoop/conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://heylinux.com/archives/configuration.xsl"?>
 fs.defaultFS
 hdfs://hadoop-master:8020
 hadoop.tmp.dir
 /data/tmp/hadoop-${user.name}
  hadoop.proxyuser.oozie.hosts
  *
  hadoop.proxyuser.oozie.groups
  *
  hadoop.proxyuser.hive.hosts
  *
  hadoop.proxyuser.hive.groups
  *

12. 配置hdfs-site.xml
$ sudo vim /etc/hadoop/conf/hdfs-site.xml

 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://heylinux.com/archives/configuration.xsl"?>
 dfs.namenode.name.dir
 /data/1/dfs/nn,/nfsmount/dfs/nn
  dfs.namenode.http-address
  hadoop-master:50070
  fs.namenode.checkpoint.period
  3600
  fs.namenode.checkpoint.dir
  /data/1/dfs/ns
  dfs.namenode.secondary.http-address
  hadoop-secondary:50090
  dfs.replication
  3
 dfs.permissions.superusergroup
 supergroup
 dfs.datanode.data.dir
 /data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn
  dfs.datanode.max.xcievers
  4096

13. 配置mapred-site.xml
$ sudo vim /etc/hadoop/conf/mapred-site.xml

 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://heylinux.com/archives/configuration.xsl"?>
 mapred.job.tracker
 hadoop-secondary:8021
 mapred.local.dir
 /data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local

14. 格式化HDFS分布式文件系统
$ sudo -u hdfs hadoop namenode -format //仅在hadoop-master上执行一次

15. 启动Hadoop进程
在hadoop-master上启动namenode
$ sudo /etc/init.d//etc/init.d/hadoop-hdfs-namenode start

在hadoop-secondary上启动secondarynamenode,jobtracker
$ sudo /etc/init.d/hadoop-hdfs-secondarynamenode start
$ sudo /etc/init.d/hadoop-0.20-mapreduce-jobtracker start

在hadoop-node上启动datanode,tasktracker
$ sudo /etc/init.d/hadoop-hdfs-datanode start
$ sudo /etc/init.d/hadoop-0.20-mapreduce-tasktracker start

16. 创建mapred.system.dir以及/tmp HDFS目录
以下HDFS操作仅需在任意一台主机上执行一次
$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
$ sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
$ sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
$ sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
$ sudo -u hdfs hadoop fs -ls -R /
$ sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system

17. 配置HADOOP_MAPRED_HOME
$ sudo vim /etc/profile.d/hadoop.sh

 
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce

$ source /etc/profile

18. 查看整个集群的状态
通过网页进行查看：http://hadoop-master:50070

19. 至此，Hadoop(HDFS)的搭建就已经完成。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How does MySQL's licensing compare to other database systems?Apr 25, 2025 am 12:26 AM

MySQL uses a GPL license. 1) The GPL license allows the free use, modification and distribution of MySQL, but the modified distribution must comply with GPL. 2) Commercial licenses can avoid public modifications and are suitable for commercial applications that require confidentiality.

When would you choose InnoDB over MyISAM, and vice versa?Apr 25, 2025 am 12:22 AM

The situations when choosing InnoDB instead of MyISAM include: 1) transaction support, 2) high concurrency environment, 3) high data consistency; conversely, the situation when choosing MyISAM includes: 1) mainly read operations, 2) no transaction support is required. InnoDB is suitable for applications that require high data consistency and transaction processing, such as e-commerce platforms, while MyISAM is suitable for read-intensive and transaction-free applications such as blog systems.

Explain the purpose of foreign keys in MySQL.Apr 25, 2025 am 12:17 AM

In MySQL, the function of foreign keys is to establish the relationship between tables and ensure the consistency and integrity of the data. Foreign keys maintain the effectiveness of data through reference integrity checks and cascading operations. Pay attention to performance optimization and avoid common errors when using them.

What are the different types of indexes in MySQL?Apr 25, 2025 am 12:12 AM

There are four main index types in MySQL: B-Tree index, hash index, full-text index and spatial index. 1.B-Tree index is suitable for range query, sorting and grouping, and is suitable for creation on the name column of the employees table. 2. Hash index is suitable for equivalent queries and is suitable for creation on the id column of the hash_table table of the MEMORY storage engine. 3. Full text index is used for text search, suitable for creation on the content column of the articles table. 4. Spatial index is used for geospatial query, suitable for creation on geom columns of locations table.

How do you create an index in MySQL?Apr 25, 2025 am 12:06 AM

TocreateanindexinMySQL,usetheCREATEINDEXstatement.1)Forasinglecolumn,use"CREATEINDEXidx_lastnameONemployees(lastname);"2)Foracompositeindex,use"CREATEINDEXidx_nameONemployees(lastname,firstname);"3)Forauniqueindex,use"CREATEU

How does MySQL differ from SQLite?Apr 24, 2025 am 12:12 AM

The main difference between MySQL and SQLite is the design concept and usage scenarios: 1. MySQL is suitable for large applications and enterprise-level solutions, supporting high performance and high concurrency; 2. SQLite is suitable for mobile applications and desktop software, lightweight and easy to embed.

What are indexes in MySQL, and how do they improve performance?Apr 24, 2025 am 12:09 AM

Indexes in MySQL are an ordered structure of one or more columns in a database table, used to speed up data retrieval. 1) Indexes improve query speed by reducing the amount of scanned data. 2) B-Tree index uses a balanced tree structure, which is suitable for range query and sorting. 3) Use CREATEINDEX statements to create indexes, such as CREATEINDEXidx_customer_idONorders(customer_id). 4) Composite indexes can optimize multi-column queries, such as CREATEINDEXidx_customer_orderONorders(customer_id,order_date). 5) Use EXPLAIN to analyze query plans and avoid

Explain how to use transactions in MySQL to ensure data consistency.Apr 24, 2025 am 12:09 AM

Using transactions in MySQL ensures data consistency. 1) Start the transaction through STARTTRANSACTION, and then execute SQL operations and submit it with COMMIT or ROLLBACK. 2) Use SAVEPOINT to set a save point to allow partial rollback. 3) Performance optimization suggestions include shortening transaction time, avoiding large-scale queries and using isolation levels reasonably.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months agoByDDD

Atomfall guide: item locations, quest guides, and tips

1 months agoByDDD

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software