Home  >  Article  >  System Tutorial  >  The core of distributed systems - logs

The core of distributed systems - logs

王林
王林forward
2024-02-12 16:09:16674browse
What is a log?

A log is a completely ordered record sequence appended in chronological order. It is actually a special file format. The file is a byte array, and the log here is a record data, but relative to the file, here Each record is arranged in relative order in time. It can be said that the log is the simplest storage model. Reading is generally from left to right. For example, message queues generally write linearly to the log file in consumer order. Start reading from offset.
Due to the inherent characteristics of the log itself, records are inserted sequentially from left to right, which means that the records on the left are "older" than the records on the right. In other words, we do not need to rely on the system clock. This feature is very useful for distributed Very important to the system.
The core of distributed systems - logs

Log application
Application of logs in database

It is impossible to know when the log appeared. It may be that it is too simple in concept. In the database field, logs are more used to synchronize data and indexes when the system crashes, such as redo log in MySQL. Redo log is a disk-based data structure used to ensure data when the system hangs. The correctness and completeness of the system are also called write-ahead logs. For example, during the execution of a thing, the redo log will be written first, and then the actual changes will be applied. In this way, when the system recovers after a crash, it can be recreated based on the redo log. Put it back to restore the data (during the initialization process, there will be no client connection at this time). The log can also be used for synchronization between the database master and slave, because essentially, all operation records of the database have been written to the log. We only need to synchronize the log to the slave and replay it on the slave to achieve master-slave synchronization. Many other required components can also be implemented here. We can obtain all changes in the database by subscribing to the redo log, thereby implementing personalized business logic, such as auditing, cache synchronization, etc.

Application of logs in distributed systems

The core of distributed systems - logs
Distributed system services are essentially about state changes, which can be understood as state machines. Two independent processes (not dependent on the external environment, such as system clocks, external interfaces, etc.) will produce consistent outputs given consistent inputs. And ultimately maintain a consistent state, and the log does not rely on the system clock due to its inherent sequence, which can be used to solve the problem of change order.
We use this feature to solve many problems encountered in distributed systems. For example, in the standby node in RocketMQ, the main broker receives the client's request and records the log, and then synchronizes it to the slave in real time. The slave replays it locally. When the master hangs up, the slave can continue to process the request, such as rejecting the write request and continuing. Handle read requests. The log can not only record data, but also directly record operations, such as SQL statements.
The core of distributed systems - logs
The log is the key data structure to solve the consistency problem. The log is like a sequence of operations. Each record represents an instruction. For example, the widely used Paxos and Raft protocols are all consistency protocols built based on the log.
The core of distributed systems - logs

Application of logs in Message Queue

Logs can be easily used to process the inflow and outflow of data. Each data source can generate its own log. The data sources here can come from various aspects, such as a certain event stream (page click, cache refresh reminder, Database binlog changes), we can centrally store logs in a cluster, and subscribers can read each record of the log based on offset, and apply their own changes based on the data and operations in each record.
The log here can be understood as a message queue, and the message queue can play the role of asynchronous decoupling and current limiting. Why do we say decoupling? Because for consumers and producers, the responsibilities of the two roles are very clear, they are responsible for producing messages and consuming messages, without caring about who is downstream or upstream, whether it is the change log of the database or a certain event. I don't need to care about a certain party at all. I only need to pay attention to the logs that interest me and each record in the logs.
The core of distributed systems - logs

We know that the QPS of the database is certain, and upper-layer applications can generally expand horizontally. At this time, if there is a sudden request scenario like Double 11, the database will be overwhelmed, then we can introduce message queues and add each team database The operations are written to the log, and another application is responsible for consuming these log records and applying them to the database. Even if the database hangs, you can continue processing from the position of the last message when recovering (both RocketMQ and Kafka support Exactly Once semantics), even if the speed of the producer is different from the speed of the consumer, there will be no impact. The log plays the role of a buffer here. It can store all records in the log and synchronize to the slave node regularly, so that The backlog capacity of messages can be greatly improved because writing logs is processed by the master node. Read requests are divided into two types, one is tail-read, which means that the consumption speed can keep up with the writing speed. One type of read can go directly to the cache, and the other type is a consumer that lags behind the write request. This type can be read from the slave node, so that through IO isolation and some file policies that come with the operating system, such as pagecache, cache pre- Reading, etc., performance can be greatly improved.
The core of distributed systems - logs

Horizontal scalability is a very important feature in a distributed system. Problems that can be solved by adding machines are not a problem. So how to implement a message queue that can achieve horizontal expansion? If we have a stand-alone message queue, as the number of topics increases, IO, CPU, bandwidth, etc. will gradually become bottlenecks, and performance will slowly decrease. So how to proceed here? What about performance optimization?

  1. topic/log sharding. Essentially, the messages written by topic are log records. As the number of writes increases, a single machine will slowly become a bottleneck. At this time, we can divide a single topic into multiple sub- topic, and allocate each topic to different machines. In this way, those topics with a large amount of messages can be solved by adding machines, while some topics with a small amount of messages can be assigned to the same machine or No partitioning
  2. Group commit, such as Kafka's producer client, when writing a message, first writes it to a local memory queue, then summarizes the message according to each partition and node, and submits it in batches. For the server side or broker side, You can also use this method to write to the pagecache first, and then flush the disk regularly. The method of flushing can be determined according to the business. For example, financial services may adopt a synchronous flushing method.
  3. Avoid useless data copies
  4. IO Isolation
    The core of distributed systems - logs
Conclusion

Logs play a very important role in distributed systems and are the key to understanding various components of distributed systems. As our understanding deepens, we find that many distributed middleware are built based on logs, such as Zookeeper, HDFS, Kafka, RocketMQ, Google Spanner, etc., and even databases such as Redis, MySQL, etc., their master-slave is based on log synchronization. Relying on the shared log system, we can implement many systems: Data synchronization between nodes , concurrent update data order issues (consistency issues), persistence (the system can continue to provide services through other nodes when the system crashes), distributed lock services, etc. I believe that after slowly practicing and reading a lot of papers, I will definitely understand Have a deeper understanding.

The above is the detailed content of The core of distributed systems - logs. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:linuxprobe.com. If there is any infringement, please contact admin@php.cn delete