search
HomeJavajavaTutorialHow to operate HDFS using Java API?

How to operate HDFS using Java API?

Apr 19, 2023 pm 02:28 PM
javaapihdfs

1. Traverse all files and folders in the current directory

You can use the listStatus method to achieve the above requirements.
The signature of the listStatus method is as follows

  /**
   * List the statuses of the files/directories in the given path if the path is
   * a directory.
   * 
   * @param f given path
   * @return the statuses of the files/directories in the given patch
   * @throws FileNotFoundException when the path does not exist;
   *         IOException see specific implementation
   */
  public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException, 
                                                         IOException;

It can be seen that listStatus only needs to pass in the parameter Path, and an array of FileStatus is returned.
FileStatus contains the following information

/** Interface that represents the client side information for a file.
 */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class FileStatus implements Writable, Comparable {

  private Path path;
  private long length;
  private boolean isdir;
  private short block_replication;
  private long blocksize;
  private long modification_time;
  private long access_time;
  private FsPermission permission;
  private String owner;
  private String group;
  private Path symlink;
  ....

It is not difficult to see from FileStatus, including file path, size, whether it is a directory, block_replication, blocksize... and other information.

import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.slf4j.LoggerFactory

object HdfsOperation {
	
	val logger = LoggerFactory.getLogger(this.getClass)
	
	def tree(sc: SparkContext, path: String) : Unit = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val fsPath = new Path(path)
		val status = fs.listStatus(fsPath)
		for(filestatus:FileStatus <- status) {
			logger.error("getPermission is: {}", filestatus.getPermission)
			logger.error("getOwner is: {}", filestatus.getOwner)
			logger.error("getGroup is: {}", filestatus.getGroup)
			logger.error("getLen is: {}", filestatus.getLen)
			logger.error("getModificationTime is: {}", filestatus.getModificationTime)
			logger.error("getReplication is: {}", filestatus.getReplication)
			logger.error("getBlockSize is: {}", filestatus.getBlockSize)
			if (filestatus.isDirectory) {
				val dirpath = filestatus.getPath.toString
				logger.error("文件夹名字为: {}", dirpath)
				tree(sc, dirpath)
			} else {
				val fullname = filestatus.getPath.toString
				val filename = filestatus.getPath.getName
				logger.error("全部文件名为: {}", fullname)
				logger.error("文件名为: {}", filename)
			}
		}
	}
}

If it is determined that fileStatus is a folder, the tree method is called recursively to achieve the purpose of traversing all.

2. Traverse all files

The above method is to traverse all files and folders. If you just want to iterate over files, you can use the listFiles method.

	def findFiles(sc: SparkContext, path: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val fsPath = new Path(path)
		val files = fs.listFiles(fsPath, true)
		while(files.hasNext) {
			val filestatus = files.next()
			val fullname = filestatus.getPath.toString
			val filename = filestatus.getPath.getName
			logger.error("全部文件名为: {}", fullname)
			logger.error("文件名为: {}", filename)
			logger.error("文件大小为: {}", filestatus.getLen)
		}
	}
  /**
   * List the statuses and block locations of the files in the given path.
   * 
   * If the path is a directory, 
   *   if recursive is false, returns files in the directory;
   *   if recursive is true, return files in the subtree rooted at the path.
   * If the path is a file, return the file&#39;s status and block locations.
   * 
   * @param f is the path
   * @param recursive if the subdirectories need to be traversed recursively
   *
   * @return an iterator that traverses statuses of the files
   *
   * @throws FileNotFoundException when the path does not exist;
   *         IOException see specific implementation
   */
  public RemoteIterator<LocatedFileStatus> listFiles(
      final Path f, final boolean recursive)
  throws FileNotFoundException, IOException {
  ...

As can be seen from the source code, listFiles returns an iterable objectRemoteIterator<locatedfilestatus></locatedfilestatus>, and listStatus returns an array. At the same time, listFiles returns all files.

3.Create folder

	def mkdirToHdfs(sc: SparkContext, path: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val result = fs.mkdirs(new Path(path))
		if (result) {
			logger.error("mkdirs already success!")
		} else {
			logger.error("mkdirs had failed!")
		}
	}

4.Delete folder

	def deleteOnHdfs(sc: SparkContext, path: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		val result = fs.delete(new Path(path), true)
		if (result) {
			logger.error("delete already success!")
		} else {
			logger.error("delete had failed!")
		}
	}

5.Upload file

	def uploadToHdfs(sc: SparkContext, localPath: String, hdfsPath: String): Unit = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		fs.copyFromLocalFile(new Path(localPath), new Path(hdfsPath))
		fs.close()
	}

6.Download file

	def downloadFromHdfs(sc: SparkContext, localPath: String, hdfsPath: String) = {
		val fs = FileSystem.get(sc.hadoopConfiguration)
		fs.copyToLocalFile(new Path(hdfsPath), new Path(localPath))
		fs.close()
	}

The above is the detailed content of How to operate HDFS using Java API?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
JVM performance vs other languagesJVM performance vs other languagesMay 14, 2025 am 12:16 AM

JVM'sperformanceiscompetitivewithotherruntimes,offeringabalanceofspeed,safety,andproductivity.1)JVMusesJITcompilationfordynamicoptimizations.2)C offersnativeperformancebutlacksJVM'ssafetyfeatures.3)Pythonisslowerbuteasiertouse.4)JavaScript'sJITisles

Java Platform Independence: Examples of useJava Platform Independence: Examples of useMay 14, 2025 am 12:14 AM

JavaachievesplatformindependencethroughtheJavaVirtualMachine(JVM),allowingcodetorunonanyplatformwithaJVM.1)Codeiscompiledintobytecode,notmachine-specificcode.2)BytecodeisinterpretedbytheJVM,enablingcross-platformexecution.3)Developersshouldtestacross

JVM Architecture: A Deep Dive into the Java Virtual MachineJVM Architecture: A Deep Dive into the Java Virtual MachineMay 14, 2025 am 12:12 AM

TheJVMisanabstractcomputingmachinecrucialforrunningJavaprogramsduetoitsplatform-independentarchitecture.Itincludes:1)ClassLoaderforloadingclasses,2)RuntimeDataAreafordatastorage,3)ExecutionEnginewithInterpreter,JITCompiler,andGarbageCollectorforbytec

JVM: Is JVM related to the OS?JVM: Is JVM related to the OS?May 14, 2025 am 12:11 AM

JVMhasacloserelationshipwiththeOSasittranslatesJavabytecodeintomachine-specificinstructions,managesmemory,andhandlesgarbagecollection.ThisrelationshipallowsJavatorunonvariousOSenvironments,butitalsopresentschallengeslikedifferentJVMbehaviorsandOS-spe

Java: Write Once, Run Anywhere (WORA) - A Deep Dive into Platform IndependenceJava: Write Once, Run Anywhere (WORA) - A Deep Dive into Platform IndependenceMay 14, 2025 am 12:05 AM

Java implementation "write once, run everywhere" is compiled into bytecode and run on a Java virtual machine (JVM). 1) Write Java code and compile it into bytecode. 2) Bytecode runs on any platform with JVM installed. 3) Use Java native interface (JNI) to handle platform-specific functions. Despite challenges such as JVM consistency and the use of platform-specific libraries, WORA greatly improves development efficiency and deployment flexibility.

Java Platform Independence: Compatibility with different OSJava Platform Independence: Compatibility with different OSMay 13, 2025 am 12:11 AM

JavaachievesplatformindependencethroughtheJavaVirtualMachine(JVM),allowingcodetorunondifferentoperatingsystemswithoutmodification.TheJVMcompilesJavacodeintoplatform-independentbytecode,whichittheninterpretsandexecutesonthespecificOS,abstractingawayOS

What features make java still powerfulWhat features make java still powerfulMay 13, 2025 am 12:05 AM

Javaispowerfulduetoitsplatformindependence,object-orientednature,richstandardlibrary,performancecapabilities,andstrongsecurityfeatures.1)PlatformindependenceallowsapplicationstorunonanydevicesupportingJava.2)Object-orientedprogrammingpromotesmodulara

Top Java Features: A Comprehensive Guide for DevelopersTop Java Features: A Comprehensive Guide for DevelopersMay 13, 2025 am 12:04 AM

The top Java functions include: 1) object-oriented programming, supporting polymorphism, improving code flexibility and maintainability; 2) exception handling mechanism, improving code robustness through try-catch-finally blocks; 3) garbage collection, simplifying memory management; 4) generics, enhancing type safety; 5) ambda expressions and functional programming to make the code more concise and expressive; 6) rich standard libraries, providing optimized data structures and algorithms.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software