Distributed databases include: 1. Elasticsearch database, which can exist on a single node or multiple nodes; 2. Redis database, which supports rich data types; 3. Mongodb database, which can obtain data more conveniently; 4. Mysql Distributed cluster, high availability.
Distributed databases include:
1. Elasticsearch database
Course Recommendation→: "Elasticsearch Full Text Search Practical Combat" (Practical Video)
From the course"Ten Million Level Data Concurrency Solution ( Theory and practice)》
1. Introduction to Elasticsearch
Distributed real-time file storage, each field is indexed and searchable, distributed real-time analysis and search The engine
can be expanded to hundreds of servers to process PB-level structured or unstructured data
2. Elasticsearch application scenarios
Distributed search engine and data analysis Engine, full-text retrieval, structured retrieval, data analysis
Near-real-time processing of massive data, on-site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, Etc.), data analysis
3. Advantages and disadvantages of Elasticsearch
Disadvantages: no user verification and permission control, no concept of transactions, no rollback support, accidental deletion cannot be restored, requires java Environment.
Advantages: Split your documents into different containers or shards, which can exist on a single node or multiple nodes
Replicate each shard to provide data backup to prevent hardware problems data lost.
Route mutual requests from any node in the cluster to ensure that the data obtained is what you need. When the cluster adds or redistributes shards, the new node will not stop to recover the lost node shard data
4. Elasticsearch persistence solution
gateway represents the persistent storage method of elasticsearch index. By default, elasticsearch stores the index in memory first, and then persists it to the hard disk when the memory is full. . When the elasticsearch cluster is shut down or restarted again, index data will be read from the gateway. Elasticsearch supports multiple types of gateways, including local file systems (default), distributed file systems, Hadoop's HDFS and Amazon's S3 cloud storage service.
ElasticSearch first saves the index content into the memory, and then persists the index to the hard disk when the memory is not enough. At the same time, it also has a queue that automatically writes the index to the hard disk when the system is idle. middle.
2. Redis database
1. Introduction to Redis
redis is an open source BSD licensed advanced key-value storage system (NoSQL) that can be used It is used to store strings, hash structures, linked lists, and sets. Therefore, it is often used to provide data structure services. Redis supports data persistence. It can save the data in the memory to the disk and load it again for use when restarting. It supports simple key-value type data, and also provides storage of data structures such as list, set, zset, and hash. Redis supports data backup, that is, data backup in master-slave mode.
2.Redis application scenario
A) Regular counting: number of fans, number of Weibo
B) User information change
C) Cache processing, As mysql's cache
D) queue system, a prioritized queue system and log collection system
3. Advantages and disadvantages of Redis
Advantages:
(1) It is fast because the data is stored in memory, similar to HashMap. The advantage of HashMap is that the time complexity of search and operation is O(1)
(2) It supports rich data types and supports string, list, set, sorted set, hash
(3) supports transactions and operations are atomic. The so-called atomicity means that all changes to the data are either executed or not executed at all
(4) Rich features: can be used for caching, messages, setting expiration time by key, and will be automatically deleted after expiration
Disadvantages:
(1) Redis does not have automatic fault tolerance and recovery Function, the downtime of the host and slave machines will cause some front-end read and write requests to fail. You need to wait for the machine to restart or manually switch the front-end IP to recover
(2) The host is down, and some data failed before the downtime. Synchronize to the slave machine in time. After switching IP, data inconsistency will be introduced, which reduces the availability of the system.
(3) The master-slave replication of redis adopts full replication. During the replication process, the host will fork a child process. Make a snapshot of the memory and save the memory snapshot of the child process as a file and send it to the slave. This process requires ensuring that the host has enough free memory. If the snapshot file is large, it will have a greater impact on the cluster's service capabilities. Moreover, the replication process will be performed when the slave machine newly joins the cluster or when the slave machine and the host network are disconnected and reconnected. That is to say, network fluctuations will cause the host and host to reconnect. A full data copy between slave machines causes a lot of trouble to the actual system operation
(4) Redis is difficult to support online expansion. When the cluster capacity reaches the upper limit, online expansion will become very complicated. In order to avoid this problem, operation and maintenance personnel must ensure that there is enough space when the system goes online, which causes a great waste of resources.
4. Redis persistence solution
Redis provides two methods for persistence, one is RDB persistence (the principle is to regularly dump the Redis database records in memory to the disk RDB persistence), and the other is AOF (append only file) persistence (the principle is to write Reids' operation log to the file in an appended manner).
RDB persistence refers to writing the snapshot of the data set in the memory to the disk within a specified time interval. The actual operation process is to fork a child process and first write the data set to a temporary file. After the writing is successful, , then replace the previous file and store it with binary compression.
3. Mongodb database
1. Introduction to Mongodb
MongoDB itself is a non-relational database. Each of its records is a Document, and each Document consists of a set of key-value pairs. Documents in MongoDB are similar to JSON objects. The values of fields in Document may include other Documents, arrays, etc.
2.Mongodb application scenario
The main goal of mongodb is to build on the key/value storage method (providing high performance and high scalability) and the traditional RDBMS system (rich functions) A bridge that combines the best of both worlds. Mongo is suitable for the following scenarios:
a. Website data: Mongo is very suitable for real-time insertion, update and query, and has the replication and high scalability required for real-time data storage on the website.
b. Caching: Due to its high performance, mongo is also suitable as a caching layer for information infrastructure. After the system is restarted, the persistent cache built by mongo can prevent the underlying data source from being overloaded.
c. Large-size, low-value data: It may be more expensive to store some data using traditional relational databases. Before this, many programmers often chose traditional files for storage.
d. High scalability scenario: mongo is very suitable for databases composed of dozens or hundreds of servers.
e. Used for storage of objects and JSON data: mongo’s BSON data format is very suitable for document formatted storage and query.
3. Advantages and disadvantages of Mongodb
Advantages:
(1) Weak consistency (eventual consistency), which can better ensure user access speed
(2) The storage method of document structure can obtain data more conveniently
(3) Built-in GridFS supports large-capacity storage
(4) In use cases, tens of millions of levels For document objects, nearly 10G of data, the query for indexed IDs will not be slower than mysql, while the query for non-indexed fields will win overall.
Disadvantages:
(1) Does not support things
(2) Occupies too much space, causing disk waste
(3) Single machine reliability Relatively poor
(4) Large amounts of data are continuously inserted, and the writing performance fluctuates greatly
4. Mongodb’s persistence solution/exception handling
When performing a write operation , MongoDB creates a journal containing the exact disk location and the changed bytes. Therefore, if the server suddenly crashes, when it starts, journal will replay any write operations that were not flushed to disk before the crash.
The data file is refreshed to the disk every 60s, by default, so the journal only needs to hold the written data within 60s. The journal pre-allocates several empty files for this purpose, located in /data/db/journal, named _j.0, j.1, etc.
When MongoDB runs for a long time, you will see files similar to _j.6217, _j.6218 and _j.6219 in the journal directory. These files are the current journal files, and if MongoDB is running all the time, these numbers will continue to increase. When MongoDB is shut down gracefully, these files will be cleared because these logs are no longer needed during a graceful shutdown.
If the server crashes or kill -9, when mongodb starts again, the journal file will be replayed and lengthy and difficult-to-understand verification lines will be output, indicating normal recovery.
4. Mysql distributed cluster
1. Introduction to Mysql distributed cluster
MySQL cluster is a shared-nothing, A storage solution based on distributed node architecture, which aims to provide fault tolerance and high performance.
Data update uses the read-committed isolation level to ensure the consistency of data on all nodes, and uses the two-phase commit mechanism (two-phasedcommit) to ensure that all nodes have the same data (if any If the write operation fails, the update fails).
Shared-nothing peer nodes make update operations on one server immediately visible on other servers. Propagating updates uses a complex communication mechanism designed to provide high throughput across the network.
Distribute the load through multiple MySQL servers to maximize program performance and ensure high availability and redundancy by storing data in different locations.
2.Mysql distributed cluster application scenario
Solve the problem of mass storage, such as the Mysql distributed cluster used by Jingdong B2B.
Suitable for billions of PV access to DB.
3. Advantages and disadvantages of Mysql distributed cluster
Advantages:
a) High availability
b) Fast automatic failover
c) Flexible distributed architecture, no single point of failure
d) High throughput and low latency
e ) Strong scalability, supports online expansion
Disadvantages:
a) There are many limitations, such as: no support for foreign keys
b) Deployment, management, and configuration are complex
c) It takes up a lot of disk space and memory
d) Backup and recovery are inconvenient
e) When restarting, it takes a long time for the data node to load data into the memory. Time
4. Mysql distributed cluster persistence solution
Load balancing.
Manage node backup.
Related free learning recommendations: mysql video tutorial
The above is the detailed content of How to understand what distributed databases are. For more information, please follow other related articles on the PHP Chinese website!

随着互联网技术的不断发展,数据库的使用越来越普遍。无论是企业还是个人,都需要使用数据库来存储和管理数据。而对于大型企业来说,单独使用一个数据库已经无法满足业务需求,这时就需要使用分布式数据库来实现数据的分散存储和管理。MySQL是目前使用最广泛的开源数据库之一,那么如何使用MySQL实现分布式数据库呢?一、什么是分布式数据库分布式数据库是指将数据库系统分散在

分布式数据库管理工具比较:MySQLvs.TiDB在当今数据数量和数据处理需求不断增长的时代,分布式数据库管理系统越来越被广泛应用。MySQL和TiDB是其中两个备受关注的分布式数据库管理工具。本文将对MySQL和TiDB进行全面比较,探讨它们的特点和优势。MySQL是一个开源的关系型数据库管理系统,被广泛用于各种应用场景。它具有良好的稳定性、可靠性和成

如何使用分布式数据库架构搭建高可用的MySQL集群随着互联网的发展,对于数据库的高可用性和扩展性的需求越来越高。分布式数据库架构成为了解决这些需求的有效方式之一。本文将介绍如何使用分布式数据库架构搭建高可用的MySQL集群,并提供相关的代码示例。搭建MySQL主从复制集群MySQL主从复制是MySQL提供的基本的高可用性解决方案。通过主从复制,可以实现数据的

配置Linux系统以支持分布式数据库开发引言:随着互联网的迅猛发展,数据量急剧增加,对数据库的性能和扩展性要求也越来越高。分布式数据库成为了应对这一挑战的解决方案。本文将介绍如何在Linux系统下配置分布式数据库环境,以支持分布式数据库开发。一、安装Linux系统首先,我们需要安装一个Linux操作系统。常见的Linux发行版有Ubuntu、CentOS、D

随着业务发展和数据量的逐步增加,单个数据库已经不能完全满足需求了,而分布式数据库系统成为了业内重要的解决方案。而MySQL是目前最受欢迎的关系型数据库之一,对于使用MySQL构建分布式数据库也有很多的解决方案。在这篇文章中,我们将深入探讨MySQL的复制与集群及如何实现大规模的分布式数据库。一、MySQL的基础架构MySQL的基础架构主要由三个部分组成:客户

分布式数据库系统特点有数据一致性、并发访问、分布式计算、负载均衡、可扩展性、安全性和可靠性等。详细介绍:1、数据一致性,分布式数据库系统通过多台服务器存储数据,因此数据的一致性由多台服务器共同维护,每台服务器都可以独立地存储和更新数据,但是它们必须遵守一致性约束,例如事务隔离级别、数据完整性等;2、并发访问,分布式数据库系统可以支持多个用户同时对数据进行读写操作等等。

在分布式系统中,Go函数可以与分布式数据库交互。具体步骤如下:安装必要依赖项。使用spanner.NewClient函数连接到数据库。使用Query方法执行查询,并获取迭代器。使用Do方法遍历查询结果并处理数据。查询完成后,使用Close方法关闭连接。

PHP是一种广泛用于Web开发的脚本语言,它具有易学易用、效率高、跨平台等优点。随着Web应用程序的复杂度不断提高,对于数据存储和管理也提出了更高的要求。传统的单一关系型数据库难以满足这些需求,因此分布式数据库成为了开发人员的关注点。在PHP7.0中,有多种实现分布式数据库的方式,下面我们将逐一介绍。分表分表是一种常见的分布式数据库实现方式,它将一张大表拆分

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1
Powerful PHP integrated development environment

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 English version
Recommended: Win version, supports code prompts!
