Due to the inherent characteristics of the log itself, records are inserted sequentially from left to right, which means that the records on the left are "older" than the records on the right. In other words, we do not need to rely on the system clock. This feature is very important for distribution. Very important for the system.
It is impossible to know when the log appeared. It may be that it is too simple in concept. In the database field, logs are more used to synchronize data and indexes when the system crashes, such as redo log in MySQL. Redo log is a disk-based data structure used to ensure data when the system hangs. The correctness and completeness of the system are also called write-ahead logs. For example, during the execution of a thing, the redo log will be written first, and then the actual changes will be applied. In this way, when the system recovers after a crash, it can be recreated based on the redo log. Put it back to restore the data (during the initialization process, there will be no client connection at this time). The log can also be used for synchronization between the database master and slave, because essentially, all operation records of the database have been written to the log. We only need to synchronize the log to the slave and replay it on the slave to achieve master-slave synchronization. Many other required components can also be implemented here. We can obtain all changes in the database by subscribing to the redo log, thereby implementing personalized business logic, such as auditing, cache synchronization, etc.
Distributed system services are essentially about state changes, which can be understood as state machines. Two independent processes (not dependent on the external environment, such as system clocks, external interfaces, etc.) will produce consistent outputs given consistent inputs. And ultimately maintain a consistent state, and the log does not rely on the system clock due to its inherent sequence, which can be used to solve the problem of change order.
We use this feature to solve many problems encountered in distributed systems. For example, in the standby node in RocketMQ, the main broker receives the client's request and records the log, and then synchronizes it to the slave in real time. The slave replays it locally. When the master hangs up, the slave can continue to process the request, such as rejecting the write request and continuing. Handle read requests. The log can not only record data, but also directly record operations, such as SQL statements.
The log is the key data structure to solve the consistency problem. The log is like an operation sequence. Each record represents an instruction. For example, the widely used Paxos and Raft protocols are all consistency protocols built based on the log.
日志可以很方便的用于处理数据之间的流入流出,每一个数据源都可以产生自己的日志,这里数据源可以来自各个方面,例如某个事件流(页面点击、缓存刷新提醒、数据库binlog变更),我们可以将日志集中存储到一个集群中,订阅者可以根据offset来读取日志的每条记录,根据每条记录中的数据、操作应用自己的变更。
这里的日志可以理解为消息队列,消息队列可以起到异步解耦、限流的作用。为什么说解耦呢?因为对于消费者、生产者来说,两个角色的职责都很清晰,就负责生产消息、消费消息,而不用关心下游、上游是谁,不管是来数据库的变更日志、某个事件也好,对于某一方来说我根本不需要关心,我只需要关注自己感兴趣的日志以及日志中的每条记录。
我们知道数据库的QPS是一定的,而上层应用一般可以横向扩容,这个时候如果到了双11这种请求突然的场景,数据库会吃不消,那么我们就可以引入消息队列,将每个队数据库的操作写到日志中,由另外一个应用专门负责消费这些日志记录并应用到数据库中,而且就算数据库挂了,当恢复的时候也可以从上次消息的位置继续处理(RocketMQ和Kafka都支持Exactly Once语义),这里即使生产者的速度异于消费者的速度也不会有影响,日志在这里起到了缓冲的作用,它可以将所有的记录存储到日志中,并定时同步到slave节点,这样消息的积压能力能够得到很好的提升,因为写日志都是有master节点处理,读请求这里分为两种,一种是tail-read,就是说消费速度能够跟得上写入速度的,这种读可以直接走缓存,而另一种也就是落后于写入请求的消费者,这种可以从slave节点读取,这样通过IO隔离以及操作系统自带的一些文件策略,例如pagecache、缓存预读等,性能可以得到很大的提升。
分布式系统中可横向扩展是一个相当重要的特性,加机器能解决的问题都不是问题。那么如何实现一个能够实现横向扩展的消息队列呢? 假如我们有一个单机的消息队列,随着topic数目的上升,IO、CPU、带宽等都会逐渐成为瓶颈,性能会慢慢下降,那么这里如何进行性能优化呢?
日志在分布式系统中扮演了很重要的角色,是理解分布式系统各个组件的关键,随着理解的深入,我们发现很多分布式中间件都是基于日志进行构建的,例如Zookeeper、HDFS、Kafka、RocketMQ、Google Spanner等等,甚至于数据库,例如Redis、MySQL等等,其master-slave都是基于日志同步的方式,依赖共享的日志系统,我们可以实现很多系统: 节点间数据同步、并发更新数据顺序问题(一致性问题)、持久性(系统crash时能够通过其他节点继续提供服务)、分布式锁服务等等,相信慢慢的通过实践、以及大量的论文阅读之后,一定会有更深层次的理解。
以上是Linux下高效的日志库的应用的详细内容。更多信息请关注PHP中文网其他相关文章!