


Big data operation and maintenance
1.HDFSDistributed file system operation and maintenance
1.Create recursion in the root directory of the HDFS file system Directory "1daoyun/file", upload the BigDataSkills.txt file in the attachment Go to the 1daoyun/file directory and use the relevant commands to view the files in the 1daoyun/file directory in the system List information.
hadoop fs -mkdir -p /1daoyun/filehadoop fs -put BigDataSkills.txt /1daoyun/file
hadoop fs -ls /1daoyun/file
2.
at HDFS Create a recursive directory under the root directory of the file system"1daoyun/file", and add the ## in the attachment #BigDataSkills.txt file, upload it to the 1daoyun/file directory, and use HDFS File systemCheck tool checks whether files are damaged. hadoop fs -mkdir -p /1daoyun/file
hadoop fs -put BigDataSkills.txt/1daoyun/file
hadoop fsck /1daoyun/file/BigDataSkills.txt
3.
at HDFS Create a recursive directory in the root directory of the file system "1daoyun/file", and add in the attachment BigDataSkills.txt file, upload to the 1daoyun/file directory, specify BigDataSkills.txt # during the upload process The ## file has a replication factor of #HDFS file system of 2 and uses fsck ToolTool checks the number of copies of storage blocks. hadoop fs -mkdir -p /1daoyun/file ##hadoop fs -D dfs.replication=2 -put BigDataSkills.txt /1daoyun/file hadoop fsck /1daoyun/file/BigDataSkills.txt 4.HDFS There is one in the root directory of the file system /apps file directory, it is required to enable the snapshot creation function of the directory and create a snapshot for the directory file , the snapshot name is apps_1daoyun, so use related commands to view the list information of the snapshot file. hadoop dfsadmin -allowSnapshot /apps hadoop fs -createSnapshot /apps apps_1daoyun hadoop fs -ls /apps/.snapshot 5.when Hadoop When the cluster starts, it will first enter the safe mode state, which will exit after 30 seconds by default. When the system is in safe mode, the HDFS file system can only be read, and cannot be written, modified, deleted, etc. It is assumed that the Hadoop cluster needs to be maintained. It is necessary to put the cluster into safe mode and check its status. hdfs dfsadmin -safemode enter 6. In order to prevent operators from accidentally deleting files, HDFS The file system provides the recycle bin function, but Many junk files will take up a lot of storage space. It is required that the WEB interface of the Xiandian big data platform completely delete the files in the HDFS file system recycle bin The time interval is 7 days. Advancedcore-sitefs.trash.interval: 10080 7.In order to prevent operators from accidentally deleting files, the HDFS file system provides a recycle bin function, but too many junk files will take up a lot of storage space. It is required to use the "vi" command in Linux Shell to modify the corresponding configuration file and parameter information. Turn off the recycle bin function. After completion, restart the corresponding service. Advancedcore-sitefs.trash.interval: 0vi /etc/hadoop/2.4.3.0 -227/0/core-site.xml ## ## trash.interval ## # sbin/stop-dfs.sh##sbin/start- dfs.sh The hosts in the cluster may experience downtime or system damage under certain circumstances. One Once these problems are encountered, The data files in the file system will inevitably be damaged or lost, HDFS The reliability of the file system now requires the redundancy replication factor of the cluster in the WEB interface of the Xidian big data platform Modify to 5. GeneralBlock replication5 9.Hadoop The hosts in the cluster may experience downtime or system damage under certain circumstances. Once Due to these problems, HDFS the data files in the file system will inevitably be damaged or lost, In order to ensure that HDFS For the reliability of the file system, the redundancy replication factor of the cluster needs to be modified to 5, in Linux Shell Use the "vi" command to modify the corresponding configuration file and parameter information. After completion, restart the corresponding service. vi/etc/hadoop/2.4.3.0-227/0/hdfs- site.xml ## # #/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf stop {namenode/datenode} /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start {namenode/datenode} 10. Use the command to view hdfs in the file system/tmp The number of directories under the directory, the number of files and the total size of the files . hadoop fs -count /tmp2.MapREDUCE Case question 1.In the cluster node/usr/hdp/2.4.3.0-227/hadoop-mapreduce/## In the # directory, there is a case JAR Packagehadoop-mapreduce-examples.jar. Run the PI program in the JAR package to calculate Piπ## Approximate value of #, requires running 5 Map tasks, each Map The number of throws for the task is 5. /usr/hdp/2.4.3.0-227/hadoop-mapreduce/##hadoop jar hadoop- mapreduce-examples-2.7.1.2.4.3.0-227.jar pi 5 5
In the cluster node/usr/hdp/2.4.3.0-227/hadoop-mapreduce/ directory, there is a caseJAR Packagehadoop-mapreduce-examples.jar. Run the wordcount program in the JAR package to #/1daoyun/file/ BigDataSkills.txt file counts words, outputs the operation results to the /1daoyun/output directory, and uses related commands to query the word count results. hadoop jar/usr/hdp/2.4.3.0-227/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1.2.4.3.0-227.jar wordcount /1daoyun/ file/BigDataSkills.txt /1daoyun/output 3.In the cluster node/usr/hdp/2.4.3.0-227/hadoop-mapreduce/## In the # directory, there is a case JAR Packagehadoop-mapreduce-examples.jar. Run the sudoku program in the JAR package to calculate the results of the Sudoku problems in the table below. . cat puzzle1.dta 4. In the cluster node## There is a case JAR in the #/usr/hdp/2.4.3.0-227/hadoop-mapreduce/ directory. Packagehadoop-mapreduce-examples.jar. Run the grep program in the JAR package to count / in the file system 1daoyun/file/BigDataSkills.txt The number of times "Hadoop" appears in the file. After the statistics are completed, query the statistical result information. hadoop jarhadoop-mapreduce-examples-2.7.1.2.4.3.0-227.jar grep /1daoyun/file/BigDataSkills.txt /output hadoop ##
or
##2.
The above is the detailed content of BigData big data operation and maintenance. For more information, please follow other related articles on the PHP Chinese website!

The key steps in Linux system management and maintenance include: 1) Master the basic knowledge, such as file system structure and user management; 2) Carry out system monitoring and resource management, use top, htop and other tools; 3) Use system logs to troubleshoot, use journalctl and other tools; 4) Write automated scripts and task scheduling, use cron tools; 5) implement security management and protection, configure firewalls through iptables; 6) Carry out performance optimization and best practices, adjust kernel parameters and develop good habits.

Linux maintenance mode is entered by adding init=/bin/bash or single parameters at startup. 1. Enter maintenance mode: Edit the GRUB menu and add startup parameters. 2. Remount the file system to read and write mode: mount-oremount,rw/. 3. Repair the file system: Use the fsck command, such as fsck/dev/sda1. 4. Back up the data and operate with caution to avoid data loss.

This article discusses how to improve Hadoop data processing efficiency on Debian systems. Optimization strategies cover hardware upgrades, operating system parameter adjustments, Hadoop configuration modifications, and the use of efficient algorithms and tools. 1. Hardware resource strengthening ensures that all nodes have consistent hardware configurations, especially paying attention to CPU, memory and network equipment performance. Choosing high-performance hardware components is essential to improve overall processing speed. 2. Operating system tunes file descriptors and network connections: Modify the /etc/security/limits.conf file to increase the upper limit of file descriptors and network connections allowed to be opened at the same time by the system. JVM parameter adjustment: Adjust in hadoop-env.sh file

This guide will guide you to learn how to use Syslog in Debian systems. Syslog is a key service in Linux systems for logging system and application log messages. It helps administrators monitor and analyze system activity to quickly identify and resolve problems. 1. Basic knowledge of Syslog The core functions of Syslog include: centrally collecting and managing log messages; supporting multiple log output formats and target locations (such as files or networks); providing real-time log viewing and filtering functions. 2. Install and configure Syslog (using Rsyslog) The Debian system uses Rsyslog by default. You can install it with the following command: sudoaptupdatesud

When choosing a Hadoop version suitable for Debian system, the following key factors need to be considered: 1. Stability and long-term support: For users who pursue stability and security, it is recommended to choose a Debian stable version, such as Debian11 (Bullseye). This version has been fully tested and has a support cycle of up to five years, which can ensure the stable operation of the system. 2. Package update speed: If you need to use the latest Hadoop features and features, you can consider Debian's unstable version (Sid). However, it should be noted that unstable versions may have compatibility issues and stability risks. 3. Community support and resources: Debian has huge community support, which can provide rich documentation and

This article describes how to use TigerVNC to share files on Debian systems. You need to install the TigerVNC server first and then configure it. 1. Install the TigerVNC server and open the terminal. Update the software package list: sudoaptupdate to install TigerVNC server: sudoaptinstalltigervnc-standalone-servertigervnc-common 2. Configure TigerVNC server to set VNC server password: vncpasswd Start VNC server: vncserver:1-localhostno

Configuring a Debian mail server's firewall is an important step in ensuring server security. The following are several commonly used firewall configuration methods, including the use of iptables and firewalld. Use iptables to configure firewall to install iptables (if not already installed): sudoapt-getupdatesudoapt-getinstalliptablesView current iptables rules: sudoiptables-L configuration

The steps to install an SSL certificate on the Debian mail server are as follows: 1. Install the OpenSSL toolkit First, make sure that the OpenSSL toolkit is already installed on your system. If not installed, you can use the following command to install: sudoapt-getupdatesudoapt-getinstallopenssl2. Generate private key and certificate request Next, use OpenSSL to generate a 2048-bit RSA private key and a certificate request (CSR): openss


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Zend Studio 13.0.1
Powerful PHP integrated development environment

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version
Useful JavaScript development tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft