1. Traverse all files and folders in the current directory
You can use the listStatus method to achieve the above requirements.
The signature of the listStatus method is as follows
/** * List the statuses of the files/directories in the given path if the path is * a directory. * * @param f given path * @return the statuses of the files/directories in the given patch * @throws FileNotFoundException when the path does not exist; * IOException see specific implementation */ public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException, IOException;
It can be seen that listStatus only needs to pass in the parameter Path, and an array of FileStatus is returned.
FileStatus contains the following information
/** Interface that represents the client side information for a file. */ @InterfaceAudience.Public @InterfaceStability.Stable public class FileStatus implements Writable, Comparable { private Path path; private long length; private boolean isdir; private short block_replication; private long blocksize; private long modification_time; private long access_time; private FsPermission permission; private String owner; private String group; private Path symlink; ....
It is not difficult to see from FileStatus, including file path, size, whether it is a directory, block_replication, blocksize... and other information.
import org.apache.hadoop.fs.{FileStatus, FileSystem, Path} import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} import org.slf4j.LoggerFactory object HdfsOperation { val logger = LoggerFactory.getLogger(this.getClass) def tree(sc: SparkContext, path: String) : Unit = { val fs = FileSystem.get(sc.hadoopConfiguration) val fsPath = new Path(path) val status = fs.listStatus(fsPath) for(filestatus:FileStatus <- status) { logger.error("getPermission is: {}", filestatus.getPermission) logger.error("getOwner is: {}", filestatus.getOwner) logger.error("getGroup is: {}", filestatus.getGroup) logger.error("getLen is: {}", filestatus.getLen) logger.error("getModificationTime is: {}", filestatus.getModificationTime) logger.error("getReplication is: {}", filestatus.getReplication) logger.error("getBlockSize is: {}", filestatus.getBlockSize) if (filestatus.isDirectory) { val dirpath = filestatus.getPath.toString logger.error("文件夹名字为: {}", dirpath) tree(sc, dirpath) } else { val fullname = filestatus.getPath.toString val filename = filestatus.getPath.getName logger.error("全部文件名为: {}", fullname) logger.error("文件名为: {}", filename) } } } }
If it is determined that fileStatus is a folder, the tree method is called recursively to achieve the purpose of traversing all.
2. Traverse all files
The above method is to traverse all files and folders. If you just want to iterate over files, you can use the listFiles method.
def findFiles(sc: SparkContext, path: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) val fsPath = new Path(path) val files = fs.listFiles(fsPath, true) while(files.hasNext) { val filestatus = files.next() val fullname = filestatus.getPath.toString val filename = filestatus.getPath.getName logger.error("全部文件名为: {}", fullname) logger.error("文件名为: {}", filename) logger.error("文件大小为: {}", filestatus.getLen) } }
/** * List the statuses and block locations of the files in the given path. * * If the path is a directory, * if recursive is false, returns files in the directory; * if recursive is true, return files in the subtree rooted at the path. * If the path is a file, return the file's status and block locations. * * @param f is the path * @param recursive if the subdirectories need to be traversed recursively * * @return an iterator that traverses statuses of the files * * @throws FileNotFoundException when the path does not exist; * IOException see specific implementation */ public RemoteIterator<LocatedFileStatus> listFiles( final Path f, final boolean recursive) throws FileNotFoundException, IOException { ...
As can be seen from the source code, listFiles returns an iterable objectRemoteIterator<locatedfilestatus></locatedfilestatus>
, and listStatus returns an array. At the same time, listFiles returns all files.
3.Create folder
def mkdirToHdfs(sc: SparkContext, path: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) val result = fs.mkdirs(new Path(path)) if (result) { logger.error("mkdirs already success!") } else { logger.error("mkdirs had failed!") } }
4.Delete folder
def deleteOnHdfs(sc: SparkContext, path: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) val result = fs.delete(new Path(path), true) if (result) { logger.error("delete already success!") } else { logger.error("delete had failed!") } }
5.Upload file
def uploadToHdfs(sc: SparkContext, localPath: String, hdfsPath: String): Unit = { val fs = FileSystem.get(sc.hadoopConfiguration) fs.copyFromLocalFile(new Path(localPath), new Path(hdfsPath)) fs.close() }
6.Download file
def downloadFromHdfs(sc: SparkContext, localPath: String, hdfsPath: String) = { val fs = FileSystem.get(sc.hadoopConfiguration) fs.copyToLocalFile(new Path(hdfsPath), new Path(localPath)) fs.close() }
The above is the detailed content of How to operate HDFS using Java API?. For more information, please follow other related articles on the PHP Chinese website!

JVM'sperformanceiscompetitivewithotherruntimes,offeringabalanceofspeed,safety,andproductivity.1)JVMusesJITcompilationfordynamicoptimizations.2)C offersnativeperformancebutlacksJVM'ssafetyfeatures.3)Pythonisslowerbuteasiertouse.4)JavaScript'sJITisles

JavaachievesplatformindependencethroughtheJavaVirtualMachine(JVM),allowingcodetorunonanyplatformwithaJVM.1)Codeiscompiledintobytecode,notmachine-specificcode.2)BytecodeisinterpretedbytheJVM,enablingcross-platformexecution.3)Developersshouldtestacross

TheJVMisanabstractcomputingmachinecrucialforrunningJavaprogramsduetoitsplatform-independentarchitecture.Itincludes:1)ClassLoaderforloadingclasses,2)RuntimeDataAreafordatastorage,3)ExecutionEnginewithInterpreter,JITCompiler,andGarbageCollectorforbytec

JVMhasacloserelationshipwiththeOSasittranslatesJavabytecodeintomachine-specificinstructions,managesmemory,andhandlesgarbagecollection.ThisrelationshipallowsJavatorunonvariousOSenvironments,butitalsopresentschallengeslikedifferentJVMbehaviorsandOS-spe

Java implementation "write once, run everywhere" is compiled into bytecode and run on a Java virtual machine (JVM). 1) Write Java code and compile it into bytecode. 2) Bytecode runs on any platform with JVM installed. 3) Use Java native interface (JNI) to handle platform-specific functions. Despite challenges such as JVM consistency and the use of platform-specific libraries, WORA greatly improves development efficiency and deployment flexibility.

JavaachievesplatformindependencethroughtheJavaVirtualMachine(JVM),allowingcodetorunondifferentoperatingsystemswithoutmodification.TheJVMcompilesJavacodeintoplatform-independentbytecode,whichittheninterpretsandexecutesonthespecificOS,abstractingawayOS

Javaispowerfulduetoitsplatformindependence,object-orientednature,richstandardlibrary,performancecapabilities,andstrongsecurityfeatures.1)PlatformindependenceallowsapplicationstorunonanydevicesupportingJava.2)Object-orientedprogrammingpromotesmodulara

The top Java functions include: 1) object-oriented programming, supporting polymorphism, improving code flexibility and maintainability; 2) exception handling mechanism, improving code robustness through try-catch-finally blocks; 3) garbage collection, simplifying memory management; 4) generics, enhancing type safety; 5) ambda expressions and functional programming to make the code more concise and expressive; 6) rich standard libraries, providing optimized data structures and algorithms.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

WebStorm Mac version
Useful JavaScript development tools

Atom editor mac version download
The most popular open source editor

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
