Home >Operation and Maintenance >Linux Operation and Maintenance >What are the Linux distributed file systems?

What are the Linux distributed file systems?

藏色散人
藏色散人Original
2023-03-21 10:20:152167browse

Linux distributed file systems include: 1. OpenAFS, which is an open source distributed file system; 2. MooseFs, which is a network distributed file system with fault tolerance; 3. googleFs, which is A scalable distributed file system and more.

What are the Linux distributed file systems?

#The operating environment of this tutorial: linux5.9.8 system, Dell G3 computer.

What are the Linux distributed file systems?

  • NFS (www.tldp.org/HOWTO/NFS-HOWTO/index.html)

The network file system is FreeBSD One of the supported file systems, also known as NFS.

NFS allows a system to share directories and files with others on the network. By using NFS, users and programs can access files on remote systems as if they were local files. Its benefits are:

1. The local workstation uses less disk space, because the usual data can be stored on one machine and can be accessed through the network.

2. Users do not need to have a home directory in every machine on the network. The home directory can be placed on an NFS server and available anywhere on the network.

3. Storage devices such as floppy drives, CDROMs, and ZIP can be used by other machines on the network. You can reduce the number of removable media devices on your entire network.

Development language c/c, can run across platforms.

  • OpenAFS (www.openafs.org)

OpenAFS is an open source distributed file system that allows communication between systems LAN and WAN to share files and resources. OpenAFS is organized around a group of file servers called cells. The identity of each server is usually hidden in the file system. Users logging in from the AFS client will not be able to tell which server they are running on, because from the user's perspective See, they want to run on a single system with recognized Unix file system semantics.

File system contents are usually copied across cells, and the failure of one hard disk will not harm the operation of the OpenAFS client. OpenAFS requires a large client cache of up to 1GB to allow access to frequently used files. It is a very secure Kerbero-based system that uses access control lists (ACLs) to allow fine-grained access, which is not based on the usual Linux and Unix security models. Development agreement IBM Public, running under linux.

  • MooseFs(derf.homelinux.org)

Moose File System is a network distributed file system with fault-tolerant function. Distributed on different servers in the network, MooseFs uses FUSE to make it look like a Unix file system. But there is a problem, it still cannot solve the problem of single point of failure. The development language is perl, which can be operated across platforms.

  • pNFS(www.pnfs.com)

Network File System (NFS) is an important part of most local area networks (LAN) Part. But NFS is not suitable for the demanding input bookcase-intensive programs in high-performance computing, at least not before. A criminal modification of the NFS standard incorporates Parallel NFS (pNFS), a parallel implementation of file sharing that increases transfer rates by orders of magnitude.

Development language c/c, running under linux.

  • googleFs

It is said to be a relatively good scalable distributed file system, used for large, distributed, and large-scale Applications that access data. It runs on cheap common hardware, but can provide fault tolerance, and it can provide high-performance services to a large number of users. Developed by Google itself.

Related extensions:

Commonly used distributed file systems include: GFS, TFS, HDFS, MooseFs, FastDfs, MogileFs, GridFs, MinIO, SeaweedFS, GlusterFS, Ceph , GlusterFS, etc.

Comparison of common distributed file systems

1. GFS (Google File System)
A scalable distributed file system based on Linux developed by Google to meet the needs of the company. It is used for large-scale, distributed access and application of big data. It is low-cost and can be used on cheap ordinary hardware. However, it is not open source and will not be considered for the time being.

2. TFS (Taobao File System)
A scalable, highly available, high-performance, Internet service-oriented, open source distributed file system developed by Alibaba to meet Taobao’s needs for small file storage. Mainly aimed at massive amounts of unstructured data, it is built on ordinary Linux machine clusters and can provide highly reliable and highly concurrent storage access to the outside world. TFS provides Taobao with massive small file storage. Usually the file size does not exceed 1M, so this is not considered for the time being.

3. HDFS (Hadoop Distributed File System)
Hadoop distributed file system is suitable for running on general-purpose hardware for distributed storage and computing because it has the characteristics of high fault tolerance and scalability. It can be deployed on cheap machines, suitable for big data processing, and has inherent advantages in offline batch processing of big data.
Hadoop is a widely used text search library developed by Apache Lucene founder Doug Cutting. It originated from Apache Nutch, which is an open source web search engine and itself part of the Luene project. The Aapche Hadoop architecture is an open source application of the MapReduce algorithm and an important cornerstone of Google's empire.

4. MooseFS
MooseFS is an open source and redundant fault-tolerant distributed POSIX file system from Poland. It also refers to the GFS architecture and implements most POSIX semantics and APIs. It supports The file is mounted through the FUSE method. At the same time, the web management interface it provides is very convenient to view the current file storage status. It has a single dependence on the master server. It is written in perl and is used for medium and large file applications, but its performance is relatively poor. , which is not considered because it may be accessed in real time.
Note: POSIX stands for Portable Operating System Interface of UNIX (abbreviated as POSIX). The POSIX standard defines the interface standard that the operating system should provide for applications.

5, FastDFS
An open source distributed file system developed by Mr. Yu Qing of Taobao. It manages files, and its functions include: file storage, file synchronization, file access (file upload, file download), etc., solving the problems of large-capacity storage and load balancing. Suitable for online services that use files as carriers, such as photo album websites, video websites, etc. FastDFS is tailor-made for the Internet, taking full account of redundant backup, load balancing, linear expansion and other mechanisms, and focusing on high availability, high performance and other indicators. FastDFS is used to build a high-performance file server cluster to provide file upload, download and other services. . However, FastDFS deployment is a bit troublesome, and its SKD is incomplete.

6. MogileFS
MogileFS is a set of efficient open source automatic file backup components, developed by Six Apart and widely used in web2.0 sites including LiveJournal. Supports multi-node redundancy and enables automatic file replication. There is no need for RAID. The application layer can directly implement RAID without sharing anything. It provides services through the cluster interface and works on the application layer. There are no special component requirements. Use HTTP to communicate.

Domestic companies that are known to use MogileFS include image hosting websites yupoo, digg, Tudou, Douban, No. 1 store, Dianping, Sogou, Anjuke and other websites. Basically, many websites have capacity and pictures. More than 30T.

7. GridFS
MongoDB is a well-known NoSql database. GridFS is a built-in function of MongoDB. It is used to store and restore files that exceed 16M (BSON file limit) (such as pictures, Audio, video, etc.), is a way of file storage, but it is stored in a MonoDB collection. It can directly leverage established replication or sharding mechanisms, so failure recovery and expansion are easy for file storage, and GridFS does not produce disk fragmentation.

8. MinIO
MinIO is an object storage service based on the Apache License v2.0 open source protocol. It is compatible with the Amazon S3 cloud storage service interface and is very suitable for storing large-capacity unstructured data, such as pictures, videos, log files, backup data and container/virtual machine images, etc., and an object file can be of any size, ranging from several Ranges from kb to a maximum of 5T. It is also a very lightweight service that can be easily combined with other applications. MinIO is characterized by simplicity, lightweight, developer-friendly, low learning cost, simple installation and operation, and can be used out of the box.

9. SeaweedFS
SeaweedFS is a highly scalable open source distributed storage system developed based on the go language. It can store billions of files (ultimately subject to the size of your hard disk), is fast, and takes up less memory. Small. Getting started is much easier than fastDFS, and it comes with its own Rest API. It is very efficient for small and medium-sized files, but the maximum capacity of a single volume is limited to 30G by the program. It is recommended to store files within 100MB.

10. Ceph
Ceph is a mature distributed file system under Red Hat, and it is also an object storage ecological environment with enterprise-level functions. The system has the characteristics of high performance, high availability, high scalability, and real-time storage. Although ceph is very powerful, it has high learning costs and complicated installation and operation and maintenance. Ceph is written in C, and the storage capacity can easily reach PB levels.

11. GlusterFS
GlusterFS is a POSIX distributed file system (open source under GPL) developed by the American Gluster Company. It is mainly used in cluster systems and has high scalability, high availability, high performance, and horizontal scalability. Scalability and other features, and its design without a metadata server, so that the entire service has no single point of failure. The system is primarily designed for medium and large files, and the storage capacity can easily reach petabytes. It has the disadvantages that expansion and contraction affect many servers, it takes time to traverse files in the directory, and the performance of small files is poor.

Related recommendations: "Linux Video Tutorial"

The above is the detailed content of What are the Linux distributed file systems?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn