Application of efficient log library under Linux-LINUX-php.cn

Home

System Tutorial

LINUX

Application of efficient log library under Linux

王林

Jun 22, 2024 am 04:57 AM

linuxlinux tutorialRed Hatlinux systemlinux commandlinux certificationred hat linuxlinux video

Due to the inherent characteristics of the log itself, records are inserted sequentially from left to right, which means that the records on the left are "older" than the records on the right. In other words, we do not need to rely on the system clock. This feature is very important for distribution. Very important for the system.

Application of logs in database

It is impossible to know when the log appeared. It may be that it is too simple in concept. In the database field, logs are more used to synchronize data and indexes when the system crashes, such as redo log in MySQL. Redo log is a disk-based data structure used to ensure data when the system hangs. The correctness and completeness of the system are also called write-ahead logs. For example, during the execution of a thing, the redo log will be written first, and then the actual changes will be applied. In this way, when the system recovers after a crash, it can be recreated based on the redo log. Put it back to restore the data (during the initialization process, there will be no client connection at this time). The log can also be used for synchronization between the database master and slave, because essentially, all operation records of the database have been written to the log. We only need to synchronize the log to the slave and replay it on the slave to achieve master-slave synchronization. Many other required components can also be implemented here. We can obtain all changes in the database by subscribing to the redo log, thereby implementing personalized business logic, such as auditing, cache synchronization, etc.

Application of logs in distributed systems

Application of efficient log library under Linux
Distributed system services are essentially about state changes, which can be understood as state machines. Two independent processes (not dependent on the external environment, such as system clocks, external interfaces, etc.) will produce consistent outputs given consistent inputs. And ultimately maintain a consistent state, and the log does not rely on the system clock due to its inherent sequence, which can be used to solve the problem of change order.
We use this feature to solve many problems encountered in distributed systems. For example, in the standby node in RocketMQ, the main broker receives the client's request and records the log, and then synchronizes it to the slave in real time. The slave replays it locally. When the master hangs up, the slave can continue to process the request, such as rejecting the write request and continuing. Handle read requests. The log can not only record data, but also directly record operations, such as SQL statements.
Application of efficient log library under Linux

The log is the key data structure to solve the consistency problem. The log is like an operation sequence. Each record represents an instruction. For example, the widely used Paxos and Raft protocols are all consistency protocols built based on the log.

Application of efficient log library under Linux

Application of logs in Message Queue

Logs can be easily used to handle the inflow and outflow of data. Each data source can generate its own log. The data sources here can come from various aspects, such as an event stream (page click, cache refresh reminder, database binlog changes), we can centrally store logs in a cluster, and subscribers can read each record of the log based on offset, and apply their own changes based on the data and operations in each record.
The log here can be understood as a message queue, and the message queue can play the role of asynchronous decoupling and current limiting. Why do we say decoupling? Because for consumers and producers, the responsibilities of the two roles are very clear, they are responsible for producing messages and consuming messages, without caring about who is downstream or upstream, whether it is the change log of the database or a certain event. I don't need to care about a certain party at all. I only need to pay attention to the logs that interest me and each record in the logs.

Application of efficient log library under Linux

We know that the QPS of the database is certain, and upper-layer applications can generally be expanded horizontally. At this time, if there is a sudden request scenario like Double 11, the database will be overwhelmed, then we can introduce message queues to combine the operations of each team's database Write to the log, and another application is responsible for consuming these log records and applying them to the database. Even if the database hangs, processing can continue from the position of the last message when recovering (both RocketMQ and Kafka support Exactly Once semantics ), here even if the speed of the producer is different from the speed of the consumer, there will be no impact. The log plays a buffering role here. It can store all records in the log and synchronize to the slave node regularly, so that the message The backlog capacity can be greatly improved because writing logs is processed by the master node. There are two types of read requests. One is tail-read, which means that the consumption speed can keep up with the writing speed. This kind of read You can go directly to the cache, and the other one is the consumer that lags behind the write request. This kind can be read from the slave node, through IO isolation and some file policies that come with the operating system, such as pagecache, cache read-ahead, etc. , the performance can be greatly improved.

Application of efficient log library under Linux

Horizontal scalability is a very important feature in a distributed system. Problems that can be solved by adding machines are not a problem. So how to implement a message queue that can achieve horizontal expansion? If we have a stand-alone message queue, as the number of topics increases, IO, CPU, bandwidth, etc. will gradually become bottlenecks, and performance will slowly decrease. So how to proceed here? What about performance optimization?

topic/log sharding. Essentially, the messages written by the topic are the records of the log. As the number of writes increases, a single machine will slowly become a bottleneck. At this time, we can divide a single topic into multiple sub-topics. And allocate each topic to a different machine. In this way, those topics with a large amount of messages can be solved by adding machines, while some topics with a small amount of messages can be assigned to the same machine or not processed. Partition
Group commit, such as Kafka's producer client, when writing messages, first writes them to a local memory queue, then summarizes the messages according to each partition and node, and submits them in batches. For the server side or broker side, it can also be Using this method, the page cache is written first, and then the disk is flushed regularly. The method of flushing can be determined according to the business. For example, financial services may adopt a synchronous flushing method.
Avoid useless data copies
IO Isolation

Logs play a very important role in distributed systems and are the key to understanding various components of distributed systems. As our understanding deepens, we find that many distributed middleware are built based on logs, such as Zookeeper, HDFS, Kafka, RocketMQ, Google Spanner, etc., and even databases such as Redis, MySQL, etc., their master-slave is based on log synchronization. Relying on the shared log system, we can implement many systems: data synchronization and concurrency between nodes Update data order issues (consistency issues), persistence (the system can continue to provide services through other nodes when the system crashes), distributed lock services, etc. I believe that through practice and reading a large number of papers, there will be deeper insights. levels of understanding.

The above is the detailed content of Application of efficient log library under Linux. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

什么是linux设备节点Apr 18, 2022 pm 08:10 PM

linux设备节点是应用程序和设备驱动程序沟通的一个桥梁；设备节点被创建在“/dev”，是连接内核与用户层的枢纽，相当于硬盘的inode一样的东西，记录了硬件设备的位置和信息。设备节点使用户可以与内核进行硬件的沟通，读写设备以及其他的操作。

Linux中open和fopen的区别有哪些Apr 29, 2022 pm 06:57 PM

区别：1、open是UNIX系统调用函数，而fopen是ANSIC标准中的C语言库函数；2、open的移植性没fopen好；3、fopen只能操纵普通正规文件，而open可以操作普通文件、网络套接字等；4、open无缓冲，fopen有缓冲。

linux中什么叫端口映射May 09, 2022 pm 01:49 PM

端口映射又称端口转发，是指将外部主机的IP地址的端口映射到Intranet中的一台计算机，当用户访问外网IP的这个端口时，服务器自动将请求映射到对应局域网内部的机器上；可以通过使用动态或固定的公共网络IP路由ADSL宽带路由器来实现。

什么是linux交叉编译Apr 29, 2022 pm 06:47 PM

在linux中，交叉编译是指在一个平台上生成另一个平台上的可执行代码，即编译源代码的平台和执行源代码编译后程序的平台是两个不同的平台。使用交叉编译的原因：1、目标系统没有能力在其上进行本地编译；2、有能力进行源代码编译的平台与目标平台不同。

linux中eof是什么May 07, 2022 pm 04:26 PM

在linux中，eof是自定义终止符，是“END Of File”的缩写；因为是自定义的终止符，所以eof就不是固定的，可以随意的设置别名，linux中按“ctrl+d”就代表eof，eof一般会配合cat命令用于多行文本输出，指文件末尾。

linux怎么判断pcre是否安装May 09, 2022 pm 04:14 PM

在linux中，可以利用“rpm -qa pcre”命令判断pcre是否安装；rpm命令专门用于管理各项套件，使用该命令后，若结果中出现pcre的版本信息，则表示pcre已经安装，若没有出现版本信息，则表示没有安装pcre。

linux怎么查询mac地址Apr 24, 2022 pm 08:01 PM

linux查询mac地址的方法：1、打开系统，在桌面中点击鼠标右键，选择“打开终端”；2、在终端中，执行“ifconfig”命令，查看输出结果，在输出信息第四行中紧跟“ether”单词后的字符串就是mac地址。

linux中rpc是什么意思May 07, 2022 pm 04:48 PM

在linux中，rpc是远程过程调用的意思，是Reomote Procedure Call的缩写，特指一种隐藏了过程调用时实际通信细节的IPC方法；linux中通过RPC可以充分利用非共享内存的多处理器环境，提高系统资源的利用率。

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

3 weeks agoByDDD

Hot Tools

Dreamweaver CS6

Visual web development tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Hot Topics

Where is the login entrance for gmail email?

7330

1627

1351

1262

1209