Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。本文通过自动化脚本配置Hadoop伪分布式模式。测试环境为VMware中的Centos 6.3, Hadoop 1.2.1.其他版本未测试。 伪分布式配置脚本 包括配置core-site.
Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。本文通过自动化脚本配置Hadoop伪分布式模式。测试环境为VMware中的Centos 6.3, Hadoop 1.2.1.其他版本未测试。
伪分布式配置脚本
包括配置core-site.xml,hdfs-site.xml及mapred-site.xml,配置ssh免密码登陆。[1]
#!/bin/bash # Usage: Hadoop伪分布式配置 # History: # 20140426 annhe 完成基本功能 # Check if user is root if [ $(id -u) != "0" ]; then printf "Error: You must be root to run this script!\n" exit 1 fi #同步时钟 rm -rf /etc/localtime ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime #yum install -y ntp ntpdate -u pool.ntp.org &>/dev/null echo -e "Time: `date` \n" #默认为单网卡结构,多网卡的暂不考虑 IP=`ifconfig eth0 |grep "inet\ addr" |awk '{print $2}' |cut -d ":" -f2` #伪分布式 function PseudoDistributed () { cd /etc/hadoop/ #恢复备份 mv core-site.xml.bak core-site.xml mv hdfs-site.xml.bak hdfs-site.xml mv mapred-site.xml.bak mapred-site.xml #备份 mv core-site.xml core-site.xml.bak mv hdfs-site.xml hdfs-site.xml.bak mv mapred-site.xml mapred-site.xml.bak #使用下面的core-site.xml cat > core-site.xml <?xml-stylesheet type="text/xsl" href="http://www.annhe.net/configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://$IP:9000</value> </property> </configuration> eof #使用下面的hdfs-site.xml cat > hdfs-site.xml <?xml-stylesheet type="text/xsl" href="http://www.annhe.net/configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> eof #使用下面的mapred-site.xml cat > mapred-site.xml <?xml-stylesheet type="text/xsl" href="http://www.annhe.net/configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>$IP:9001</value> </property> </configuration> eof } #配置ssh免密码登陆 function PassphraselessSSH () { #不重复生成私钥 [ ! -f ~/.ssh/id_dsa ] && ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/authorized_keys |grep "`cat ~/.ssh/id_dsa.pub`" &>/dev/null && r=0 || r=1 #没有公钥的时候才添加 [ $r -eq 1 ] && cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys chmod 644 ~/.ssh/authorized_keys } #执行 function Execute () { #格式化一个新的分布式文件系统 hadoop namenode -format #启动Hadoop守护进程 start-all.sh echo -e "\n========================================================================" echo "hadoop log dir : $HADOOP_LOG_DIR" echo "NameNode - http://$IP:50070/" echo "JobTracker - http://$IP:50030/" echo -e "\n=========================================================================" } PseudoDistributed 2>&1 | tee -a pseudo.log PassphraselessSSH 2>&1 | tee -a pseudo.log Execute 2>&1 | tee -a pseudo.log
脚本测试结果
[root@hadoop hadoop]# ./pseudo.sh 14/04/26 23:52:30 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop/216.34.94.184 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:27:42 PDT 2013 STARTUP_MSG: java = 1.7.0_51 ************************************************************/ Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y Format aborted in /tmp/hadoop-root/dfs/name 14/04/26 23:52:40 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop/216.34.94.184 ************************************************************/ starting namenode, logging to /var/log/hadoop/root/hadoop-root-namenode-hadoop.out localhost: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-hadoop.out localhost: starting secondarynamenode, logging to /var/log/hadoop/root/hadoop-root-secondarynamenode-hadoop.out starting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-hadoop.out localhost: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-hadoop.out ======================================================================== hadoop log dir : /var/log/hadoop/root NameNode - http://192.168.60.128:50070/ JobTracker - http://192.168.60.128:50030/ =========================================================================
通过宿主机上的浏览器访问NameNode和JobTracker的网络接口
浏览器访问namenode的网络接口
浏览器访问jobtracker网络接口
运行测试程序
将输入文件拷贝到分布式文件系统:
$ hadoop fs -put input input
通过网络接口查看hdfs
通过NameNode网络接口查看hdfs文件系统
运行示例程序
[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input output
通过JobTracker网络接口查看执行状态
Wordcount执行状态
执行结果
[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input out2 14/04/27 03:34:56 INFO input.FileInputFormat: Total input paths to process : 2 14/04/27 03:34:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/04/27 03:34:56 WARN snappy.LoadSnappy: Snappy native library not loaded 14/04/27 03:34:57 INFO mapred.JobClient: Running job: job_201404270333_0001 14/04/27 03:34:58 INFO mapred.JobClient: map 0% reduce 0% 14/04/27 03:35:49 INFO mapred.JobClient: map 100% reduce 0% 14/04/27 03:36:16 INFO mapred.JobClient: map 100% reduce 100% 14/04/27 03:36:19 INFO mapred.JobClient: Job complete: job_201404270333_0001 14/04/27 03:36:19 INFO mapred.JobClient: Counters: 29 14/04/27 03:36:19 INFO mapred.JobClient: Job Counters 14/04/27 03:36:19 INFO mapred.JobClient: Launched reduce tasks=1 14/04/27 03:36:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=72895 14/04/27 03:36:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/04/27 03:36:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/04/27 03:36:19 INFO mapred.JobClient: Launched map tasks=2 14/04/27 03:36:19 INFO mapred.JobClient: Data-local map tasks=2 14/04/27 03:36:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24880 14/04/27 03:36:19 INFO mapred.JobClient: File Output Format Counters 14/04/27 03:36:19 INFO mapred.JobClient: Bytes Written=25 14/04/27 03:36:19 INFO mapred.JobClient: FileSystemCounters 14/04/27 03:36:19 INFO mapred.JobClient: FILE_BYTES_READ=55 14/04/27 03:36:19 INFO mapred.JobClient: HDFS_BYTES_READ=260 14/04/27 03:36:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=164041 14/04/27 03:36:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25 14/04/27 03:36:19 INFO mapred.JobClient: File Input Format Counters 14/04/27 03:36:19 INFO mapred.JobClient: Bytes Read=25 14/04/27 03:36:19 INFO mapred.JobClient: Map-Reduce Framework 14/04/27 03:36:19 INFO mapred.JobClient: Map output materialized bytes=61 14/04/27 03:36:19 INFO mapred.JobClient: Map input records=2 14/04/27 03:36:19 INFO mapred.JobClient: Reduce shuffle bytes=61 14/04/27 03:36:19 INFO mapred.JobClient: Spilled Records=8 14/04/27 03:36:19 INFO mapred.JobClient: Map output bytes=41 14/04/27 03:36:19 INFO mapred.JobClient: Total committed heap usage (bytes)=414441472 14/04/27 03:36:19 INFO mapred.JobClient: CPU time spent (ms)=2910 14/04/27 03:36:19 INFO mapred.JobClient: Combine input records=4 14/04/27 03:36:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=235 14/04/27 03:36:19 INFO mapred.JobClient: Reduce input records=4 14/04/27 03:36:19 INFO mapred.JobClient: Reduce input groups=3 14/04/27 03:36:19 INFO mapred.JobClient: Combine output records=4 14/04/27 03:36:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=353439744 14/04/27 03:36:19 INFO mapred.JobClient: Reduce output records=3 14/04/27 03:36:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2195972096 14/04/27 03:36:19 INFO mapred.JobClient: Map output records=4
查看结果
[root@hadoop hadoop]# hadoop fs -cat out2/* hadoop 1 hello 2 world 1
也可以将分布式文件系统上的文件拷贝到本地查看
[root@hadoop hadoop]# hadoop fs -get out2 out4 [root@hadoop hadoop]# cat out4/* cat: out4/_logs: Is a directory hadoop 1 hello 2 world 1
完成全部操作后,停止守护进程:
[root@hadoop hadoop]# stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode
遇到的问题
宿主机不能访问网络接口
因为开启了iptables,所以需要添加相应端口,当然测试环境也可以直接将iptables关闭。
# Firewall configuration written by system-config-firewall # Manual customization of this file is not recommended. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited COMMIT
Browse the filesystem跳转地址不对
NameNode网络接口点击Browse the filesystem,跳转到localhost:50075。[2][3]
修改core-site.xml,将hdfs://localhost:9000改成虚拟机ip地址。(上面的脚本已经改写为自动配置为IP)。
根据几次改动的情况,这里也是可以填写域名的,只是要在访问的机器上能解析这个域名。因此公网环境中有DNS服务器的应该是可以设置域名的。
执行reduce的时候卡死
在/etc/hosts中添加主机名对应的ip地址 [4][5]。(已更新Hadoop安装脚本,会自动配置此项)
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 hadoop #添加这一行
参考文献
[1]. Hadoop官方文档.?http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
[2]. Stackoverflow.?http://stackoverflow.com/questions/15254492/wrong-redirect-from-hadoop-hdfs-namenode-to-localhost50075
[3]. Iteye.?http://yymmiinngg.iteye.com/blog/706909
[4].Stackoverflow.?http://stackoverflow.com/questions/10165549/hadoop-wordcount-example-stuck-at-map-100-reduce-0
[5]. 李俊的博客.?http://www.colorlight.cn/archives/32
本文遵从CC版权协定,转载请以链接形式注明出处。
本文链接地址: http://www.annhe.net/article-2682.html

Mysqlstringtypesimpactstorageandperformanceasfollows: 1) che-chexed-length, selingasingthesamestoragespace, whycanbefasterbutlessspace-efficient.2) varcharisvariable-length, morespace-efficientbutpotoTanSlower.3)

Mysqlstringtypesincludevarchar, teks, char, enum, andset.1) varcharisversatorvariable-lengtstringsuptoaspecifiedlimit.2)

Mysqloffersvariousstringdatatypes: 1) charforfixed-lengtstrings, 2) varcharforvariable-lengthtext, 3) binaryandvarbinaryforbinarydata, 4) blobandtextforlargedata, dan5)

TograntpermissionstonewMySQLusers,followthesesteps:1)AccessMySQLasauserwithsufficientprivileges,2)CreateanewuserwiththeCREATEUSERcommand,3)UsetheGRANTcommandtospecifypermissionslikeSELECT,INSERT,UPDATE,orALLPRIVILEGESonspecificdatabasesortables,and4)

Toaddusersinmysqleffectivelyandsecurely, ikutiTheSesteps: 1) usethecreateUserStatementToadDanewuser, spesifyingthehostandastrongpassword.2) GrantnessaryPrivileGeSingSupingTheGrantement, ADHERINGTOTHEPRINCIPREFLEFLEASE.3)

TOADDANEWUSERWITHCEPLEXPELPISIONSIONSIONMYSQL, FOLLONGHESESTEPS: 1) COTETETHEUSERWITHCEATEUSER'NEWUSER '@' LOCSOUSTHOST'IDENTIFIFYBY'PA ssword ';. 2) grantrearaccesstoalltablesin'mydatabase'withgrantselectonmydatabase.to'newuser'@'localhost' ;. 3) GrantWriteAccessto '

Jenis data rentetan di MySQL termasuk char, varchar, binari, varbinary, gumpalan, dan teks. Kolaborasi menentukan perbandingan dan menyusun rentetan. 1.BARI sesuai untuk rentetan panjang tetap, Varchar sesuai untuk rentetan panjang berubah-ubah. 2.Binary dan Varbinary digunakan untuk data binari, dan gumpalan dan teks digunakan untuk data objek besar. 3. Peraturan menyusun seperti UTF8MB4_UNICODE_CI mengabaikan kes atas dan bawah dan sesuai untuk nama pengguna; UTF8MB4_BIN adalah sensitif kes dan sesuai untuk bidang yang memerlukan perbandingan yang tepat.

Pemilihan panjang lajur MySqlvarchar terbaik harus berdasarkan analisis data, pertimbangkan pertumbuhan masa depan, menilai kesan prestasi, dan keperluan set aksara. 1) menganalisis data untuk menentukan panjang biasa; 2) Rizab ruang pengembangan masa depan; 3) memberi perhatian kepada kesan panjang besar pada prestasi; 4) Pertimbangkan kesan set aksara pada penyimpanan. Melalui langkah -langkah ini, kecekapan dan skalabiliti pangkalan data dapat dioptimumkan.


Alat AI Hot

Undresser.AI Undress
Apl berkuasa AI untuk mencipta foto bogel yang realistik

AI Clothes Remover
Alat AI dalam talian untuk mengeluarkan pakaian daripada foto.

Undress AI Tool
Gambar buka pakaian secara percuma

Clothoff.io
Penyingkiran pakaian AI

Video Face Swap
Tukar muka dalam mana-mana video dengan mudah menggunakan alat tukar muka AI percuma kami!

Artikel Panas

Alat panas

SecLists
SecLists ialah rakan penguji keselamatan muktamad. Ia ialah koleksi pelbagai jenis senarai yang kerap digunakan semasa penilaian keselamatan, semuanya di satu tempat. SecLists membantu menjadikan ujian keselamatan lebih cekap dan produktif dengan menyediakan semua senarai yang mungkin diperlukan oleh penguji keselamatan dengan mudah. Jenis senarai termasuk nama pengguna, kata laluan, URL, muatan kabur, corak data sensitif, cangkerang web dan banyak lagi. Penguji hanya boleh menarik repositori ini ke mesin ujian baharu dan dia akan mempunyai akses kepada setiap jenis senarai yang dia perlukan.

PhpStorm versi Mac
Alat pembangunan bersepadu PHP profesional terkini (2018.2.1).

SublimeText3 Linux versi baharu
SublimeText3 Linux versi terkini

mPDF
mPDF ialah perpustakaan PHP yang boleh menjana fail PDF daripada HTML yang dikodkan UTF-8. Pengarang asal, Ian Back, menulis mPDF untuk mengeluarkan fail PDF "dengan cepat" dari tapak webnya dan mengendalikan bahasa yang berbeza. Ia lebih perlahan dan menghasilkan fail yang lebih besar apabila menggunakan fon Unicode daripada skrip asal seperti HTML2FPDF, tetapi menyokong gaya CSS dsb. dan mempunyai banyak peningkatan. Menyokong hampir semua bahasa, termasuk RTL (Arab dan Ibrani) dan CJK (Cina, Jepun dan Korea). Menyokong elemen peringkat blok bersarang (seperti P, DIV),

SublimeText3 versi Mac
Perisian penyuntingan kod peringkat Tuhan (SublimeText3)
