HDFS在设计上仿照Linux下的文件操作命令,所以对Linux文件命令熟悉的小伙伴在这里很好上手。另外在Hadoop DFS中没有pwd概念,所有都需要全路径。(本文基于版本2.5 CDH 5.2.1) 列出命令列表、格式和帮助,以及选择一个非参数文件配置的namenode。 hdfs dfs -
HDFS在设计上仿照Linux下的文件操作命令,所以对Linux文件命令熟悉的小伙伴在这里很好上手。另外在Hadoop DFS中没有pwd概念,所有都需要全路径。(本文基于版本2.5 CDH 5.2.1)
列出命令列表、格式和帮助,以及选择一个非参数文件配置的namenode。
hdfs dfs -usage hadoop dfs -usage ls hadoop dfs -help -fs <local|namenode:port> specify a namenode hdfs dfs -fs hdfs://test1:9000 -ls /
——————————————————————————–
-df [-h] [path …] :
Shows the capacity, free and used space of the filesystem. If the filesystem has
multiple partitions, and no path to a particular partition is specified, then
the status of the root partitions will be shown.
$ hdfs dfs -df Filesystem Size Used Available Use% hdfs://test1:9000 413544071168 98304 345612906496 0%
——————————————————————————–
-mkdir [-p] path … :
Create a directory in specified location.
-p Do not fail if the directory already exists
-rmdir dir … :
Removes the directory entry specified by each directory argument, provided it is
empty.
hdfs dfs -mkdir /tmp hdfs dfs -mkdir /tmp/txt hdfs dfs -rmdir /tmp/txt hdfs dfs -mkdir -p /tmp/txt/hello
——————————————————————————–
-copyFromLocal [-f] [-p] localsrc … dst :
Identical to the -put command.
-copyToLocal [-p] [-ignoreCrc] [-crc] src … localdst :
Identical to the -get command.
-moveFromLocal localsrc …
Same as -put, except that the source is deleted after it’s copied.
-put [-f] [-p] localsrc …
Copy files from the local file system into fs. Copying fails if the file already
exists, unless the -f flag is given. Passing -p preserves access and
modification times, ownership and the mode. Passing -f overwrites the
destination if it already exists.
-get [-p] [-ignoreCrc] [-crc] src … localdst :
Copy files that match the file pattern src to the local name. src is kept.
When copying multiple files, the destination must b/e a directory. Passing -p
preserves access and modification times, ownership and the mode.
-getmerge [-nl] src localdst :
Get all the files in the directories that match the source file pattern and
merge and sort them to only one file on local fs. src is kept.
-nl Add a newline character at the end of each file.
-cat [-ignoreCrc] src … :
Fetch all files that match the file pattern src and display their content on
stdout.
#通配符? * {} [] hdfs dfs -cat /tmp/*.txt Hello, Hadoop Hello, HDFS hdfs dfs -cat /tmp/h?fs.txt Hello, HDFS hdfs dfs -cat /tmp/h{a,d}*.txt Hello, Hadoop Hello, HDFS hdfs dfs -cat /tmp/h[a-d]*.txt Hello, Hadoop Hello, HDFS echo "Hello, Hadoop" > hadoop.txt echo "Hello, HDFS" > hdfs.txt dd if=/dev/zero of=/tmp/test.zero bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 0.93978 s, 1.1 GB/s hdfs dfs -moveFromLocal /tmp/test.zero /tmp hdfs dfs -put *.txt /tmp
——————————————————————————–
-ls [-d] [-h] [-R] [path …] :
List the contents that match the specified file pattern. If path is not
specified, the contents of /user/currentUser will be listed. Directory entries
are of the form:
permissions – userId groupId sizeOfDirectory(in bytes)
modificationDate(yyyy-MM-dd HH:mm) directoryName
and file entries are of the form:
permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
modificationDate(yyyy-MM-dd HH:mm) fileName
-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion rather than a number
of bytes.
-R Recursively list the contents of directories.
hdfs dfs -ls /tmp hdfs dfs -ls -d /tmp hdfs dfs -ls -h /tmp Found 4 items -rw-r--r-- 3 hdfs supergroup 14 2014-12-18 10:00 /tmp/hadoop.txt -rw-r--r-- 3 hdfs supergroup 12 2014-12-18 10:00 /tmp/hdfs.txt -rw-r--r-- 3 hdfs supergroup 1 G 2014-12-18 10:19 /tmp/test.zero drwxr-xr-x - hdfs supergroup 0 2014-12-18 10:07 /tmp/txt hdfs dfs -ls -R -h /tmp -rw-r--r-- 3 hdfs supergroup 14 2014-12-18 10:00 /tmp/hadoop.txt -rw-r--r-- 3 hdfs supergroup 12 2014-12-18 10:00 /tmp/hdfs.txt -rw-r--r-- 3 hdfs supergroup 1 G 2014-12-18 10:19 /tmp/test.zero drwxr-xr-x - hdfs supergroup 0 2014-12-18 10:07 /tmp/txt drwxr-xr-x - hdfs supergroup 0 2014-12-18 10:07 /tmp/txt/hello
——————————————————————————–
-checksum src … :
Dump checksum information for files that match the file pattern src to stdout.
Note that this requires a round-trip to a datanode storing each block of the
file, and thus is not efficient to run on a large number of files. The checksum
of a file depends on its content, block size and the checksum algorithm and
parameters used for creating the file.
hdfs dfs -checksum /tmp/test.zero /tmp/test.zero MD5-of-262144MD5-of-512CRC32C 000002000000000000040000f960570129a4ef3a7e179073adceae97
——————————————————————————–
-appendToFile localsrc … dst :
Appends the contents of all the given local files to the given dst file. The dst
file will be created if it does not exist. If localSrc is -, then the input is
read from stdin.
hdfs dfs -appendToFile *.txt hello.txt hdfs dfs -cat hello.txt Hello, Hadoop Hello, HDFS
——————————————————————————–
-tail [-f] file :
Show the last 1KB of the file.
hdfs dfs -tail -f hello.txt #waiting for output. then Ctrl + C #another terminal hdfs dfs -appendToFile - hello.txt #then type something
——————————————————————————–
-cp [-f] [-p | -p[topax]] src …
Copy files that match the file pattern src to a destination. When copying
multiple files, the destination must be a directory. Passing -p preserves status
[topax] (timestamps, ownership, permission, ACLs, XAttr). If -p is specified
with no arg, then preserves timestamps, ownership, permission. If -pa is
permission. Passing -f overwrites the destination if it already exists. raw
namespace extended attributes are preserved if (1) they are supported (HDFS
only) and, (2) all of the source and target pathnames are in the /.reserved/raw
hierarchy. raw namespace xattr preservation is determined solely by the presence
(or absence) of the /.reserved/raw prefix and not by the -p option.
-mv src … dst :
Move files that match the specified file pattern src to a destination dst.
When moving multiple files, the destination must be a directory.
-rm [-f] [-r|-R] [-skipTrash] src … :
Delete all files that match the specified file pattern. Equivalent to the Unix
command “rm src”
-skipTrash option bypasses trash, if enabled, and immediately deletes src
-f If the file does not exist, do not display a diagnostic message or
modify the exit status to reflect an error.
-[rR] Recursively deletes directories
-stat [format] path … :
Print statistics about the file/directory at path in the specified format.
Format accepts filesize in blocks (%b), group name of owner(%g), filename (%n),
block size (%o), replication (%r), user name of owner(%u), modification date
(%y, %Y)
hdfs dfs -stat /tmp/hadoop.txt 2014-12-18 02:00:08 hdfs dfs -cp -p -f /tmp/hello.txt /tmp/hello.txt.bak hdfs dfs -stat /tmp/hadoop.txt.bak hdfs dfs -rm /tmp/not_exists rm: `/tmp/not_exists': No such file or directory echo $? 1 hdfs dfs -rm -f /tmp/123321123123123 echo $? 0
——————————————————————————–
-count [-q] path … :
Count the number of directories, files and bytes under the paths
that match the specified file pattern. The output columns are:
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME or
QUOTA REMAINING_QUOTA SPACE_QUOTA REMAINING_SPACE_QUOTA
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME
-du [-s] [-h] path … :
Show the amount of space, in bytes, used by the files that match the specified
file pattern. The following flags are optional:
-s Rather than showing the size of each individual file that matches the
pattern, shows the total (summary) size.
-h Formats the sizes of files in a human-readable fashion rather than a number
of bytes.
Note that, even without the -s option, this only shows size summaries one level
deep into a directory.
The output is in the form
size name(full path)
hdfs dfs -count /tmp 3 3 1073741850 /tmp hdfs dfs -du /tmp 14 /tmp/hadoop.txt 12 /tmp/hdfs.txt 1073741824 /tmp/test.zero 0 /tmp/txt hdfs dfs -du -s /tmp 1073741850 /tmp hdfs dfs -du -s -h /tmp 1.0 G /tmp
——————————————————————————–
-chgrp [-R] GROUP PATH… :
This is equivalent to -chown … :GROUP …
-chmod [-R] MODE[,MODE]… | OCTALMODE PATH… :
Changes permissions of a file. This works similar to the shell’s chmod command
with a few exceptions.
-R modifies the files recursively. This is the only option currently
supported.
MODE Mode is the same as mode used for the shell’s command. The only
letters recognized are ‘rwxXt’, e.g. +t,a+r,g-w,+rwx,o=r.
OCTALMODE Mode specifed in 3 or 4 digits. If 4 digits, the first may be 1 or
0 to turn the sticky bit on or off, respectively. Unlike the
shell command, it is not possible to specify only part of the
mode, e.g. 754 is same as u=rwx,g=rx,o=r.
If none of ‘augo’ is specified, ‘a’ is assumed and unlike the shell command, no
umask is applied.
-chown [-R] [OWNER][:[GROUP]] PATH… :
Changes owner and group of a file. This is similar to the shell’s chown command
with a few exceptions.
-R modifies the files recursively. This is the only option currently
supported.
If only the owner or group is specified, then only the owner or group is
modified. The owner and group names may only consist of digits, alphabet, and
any of [-_./@a-zA-Z0-9]. The names are case sensitive.
WARNING: Avoid using ‘.’ to separate user name and group though Linux allows it.
If user names have dots in them and you are using local file system, you might
see surprising results since the shell command ‘chown’ is used for local files.
-touchz path … :
Creates a file of zero length at path with current time as the timestamp of
that path. An error is returned if the file exists with non-zero length
hdfs dfs -mkdir -p /user/spark/tmp hdfs dfs -chown -R spark:hadoop /user/spark hdfs dfs -chmod -R 775 /user/spark/tmp hdfs dfs -ls -d /user/spark/tmp drwxrwxr-x - spark hadoop 0 2014-12-18 14:51 /user/spark/tmp hdfs dfs -chmod +t /user/spark/tmp #user:spark hdfs dfs -touchz /user/spark/tmp/own_by_spark #user:hadoop useradd -g hadoop hadoop su - hadoop id uid=502(hadoop) gid=492(hadoop) groups=492(hadoop) hdfs dfs -rm /user/spark/tmp/own_by_spark rm: Permission denied by sticky bit setting: user=hadoop, inode=own_by_spark #使用超级管理员(dfs.permissions.superusergroup = hdfs),可以无视sticky位设置
——————————————————————————–
-test -[defsz] path :
Answer various questions about path, with result via exit status.
-d return 0 if path is a directory.
-e return 0 if path exists.
-f return 0 if path is a file.
-s return 0 if file path is greater than zero bytes in size.
-z return 0 if file path is zero bytes in size, else return 1.
hdfs dfs -test -d /tmp echo $? 0 hdfs dfs -test -f /tmp/txt echo $? 1
——————————————————————————–
-setrep [-R] [-w] rep path … :
Set the replication level of a file. If path is a directory then the command
recursively changes the replication factor of all files under the directory tree
rooted at path.
-w It requests that the command waits for the replication to complete. This
can potentially take a very long time.
hdfs fsck /tmp/test.zero -blocks -locations Average block replication: 3.0 hdfs dfs -setrep -w 4 /tmp/test.zero Replication 4 set: /tmp/test.zero Waiting for /tmp/test.zero .... done hdfs fsck /tmp/test.zero -blocks Average block replication: 4.0
本文出自:http://debugo.com, 原文地址:http://debugo.com/hdfs-cmd1/, 感谢原作者分享。

MySQL和SQLite的主要区别在于设计理念和使用场景:1.MySQL适用于大型应用和企业级解决方案,支持高性能和高并发;2.SQLite适合移动应用和桌面软件,轻量级且易于嵌入。

MySQL中的索引是数据库表中一列或多列的有序结构,用于加速数据检索。1)索引通过减少扫描数据量提升查询速度。2)B-Tree索引利用平衡树结构,适合范围查询和排序。3)创建索引使用CREATEINDEX语句,如CREATEINDEXidx_customer_idONorders(customer_id)。4)复合索引可优化多列查询,如CREATEINDEXidx_customer_orderONorders(customer_id,order_date)。5)使用EXPLAIN分析查询计划,避

在MySQL中使用事务可以确保数据一致性。1)通过STARTTRANSACTION开始事务,执行SQL操作后用COMMIT提交或ROLLBACK回滚。2)使用SAVEPOINT可以设置保存点,允许部分回滚。3)性能优化建议包括缩短事务时间、避免大规模查询和合理使用隔离级别。

选择PostgreSQL而非MySQL的场景包括:1)需要复杂查询和高级SQL功能,2)要求严格的数据完整性和ACID遵从性,3)需要高级空间功能,4)处理大数据集时需要高性能。PostgreSQL在这些方面表现出色,适合需要复杂数据处理和高数据完整性的项目。

MySQL数据库的安全可以通过以下措施实现:1.用户权限管理:通过CREATEUSER和GRANT命令严格控制访问权限。2.加密传输:配置SSL/TLS确保数据传输安全。3.数据库备份和恢复:使用mysqldump或mysqlpump定期备份数据。4.高级安全策略:使用防火墙限制访问,并启用审计日志记录操作。5.性能优化与最佳实践:通过索引和查询优化以及定期维护兼顾安全和性能。

如何有效监控MySQL性能?使用mysqladmin、SHOWGLOBALSTATUS、PerconaMonitoringandManagement(PMM)和MySQLEnterpriseMonitor等工具。1.使用mysqladmin查看连接数。2.用SHOWGLOBALSTATUS查看查询数。3.PMM提供详细性能数据和图形化界面。4.MySQLEnterpriseMonitor提供丰富的监控功能和报警机制。

MySQL和SQLServer的区别在于:1)MySQL是开源的,适用于Web和嵌入式系统,2)SQLServer是微软的商业产品,适用于企业级应用。两者在存储引擎、性能优化和应用场景上有显着差异,选择时需考虑项目规模和未来扩展性。

在需要高可用性、高级安全性和良好集成性的企业级应用场景下,应选择SQLServer而不是MySQL。1)SQLServer提供企业级功能,如高可用性和高级安全性。2)它与微软生态系统如VisualStudio和PowerBI紧密集成。3)SQLServer在性能优化方面表现出色,支持内存优化表和列存储索引。


热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

Video Face Swap
使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热门文章

热工具

螳螂BT
Mantis是一个易于部署的基于Web的缺陷跟踪工具,用于帮助产品缺陷跟踪。它需要PHP、MySQL和一个Web服务器。请查看我们的演示和托管服务。

SecLists
SecLists是最终安全测试人员的伙伴。它是一个包含各种类型列表的集合,这些列表在安全评估过程中经常使用,都在一个地方。SecLists通过方便地提供安全测试人员可能需要的所有列表,帮助提高安全测试的效率和生产力。列表类型包括用户名、密码、URL、模糊测试有效载荷、敏感数据模式、Web shell等等。测试人员只需将此存储库拉到新的测试机上,他就可以访问到所需的每种类型的列表。

Atom编辑器mac版下载
最流行的的开源编辑器

EditPlus 中文破解版
体积小,语法高亮,不支持代码提示功能

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)