Home > Article > Backend Development > Distributed real-time log analysis solution ELK deployment architecture
ELK has become the most popular centralized log solution at present. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the collection, storage, display and other one-stop solutions of real-time logs. This article mainly introduces you to the distributed real-time log analysis solution ELK deployment architecture. Friends in need can take a look at
Course Recommendation →: 《 Elasticsearch full-text search practice" (practical video)
From the course "Ten million-level data concurrency solution (theory + practice)"
ELK has become the most popular centralized log solution at present. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the collection, storage and display of real-time logs in one stop. solution. This article will introduce the common architecture of ELK and solve related problems.
Filebeat: Filebeat is a lightweight data collection engine that takes up very few service resources. It is a new member of the ELK family and can replace Logstash as the log collection on the application server. The engine supports outputting collected data to queues such as Kafka and Redis.
Logstash: Data collection engine, which is more heavyweight than Filebeat, but it integrates a large number of plug-ins and supports the collection of rich data sources. The collected data can be filtered and analyzed. Format log format.
Elasticsearch: Distributed data search engine, implemented based on Apache
Lucene, can be clustered, and provides centralized storage and analysis of data, as well as powerful data search and aggregation functions.
Kibana: A data visualization platform. This web platform allows you to view relevant data in Elasticsearch in real time and provides rich chart statistics.
2.1. Logstash as a log collector
This architecture is relatively primitive Deployment architecture: deploy a Logstash component on each application server as a log collector, then filter, analyze, and format the data collected by Logstash and send it to Elasticsearch storage, and finally use Kibana for visual display. This architecture is insufficient The key point is: Logstash consumes more server resources, so it will increase the load pressure on the application server.
2.2. Filebeat as a log collector
The only difference between this architecture and the first architecture Yes: The application-side log collector is replaced by Filebeat. Filebeat is lightweight and takes up less server resources, so Filebeat is used as the application server-side log collector. Generally, Filebeat is used together with Logstash. This deployment method is also the most commonly used architecture at present. .
2.3. Deployment architecture that introduces cache queue
This architecture is based on the second architecture The Kafka message queue (can also be other message queues) is introduced, the data collected by Filebeat is sent to Kafka, and then the data in Kafka is read through Logstasth. This architecture is mainly used to solve log collection solutions under large data volumes. , using cache queues is mainly to solve data security and balance the load pressure of Logstash and Elasticsearch.
##2.4. Summary of the above three architectures
The first deployment architecture has resource occupancy issues , is now rarely used, and the second deployment architecture is currently the most used. As for the third deployment architecture, I personally feel that there is no need to introduce a message queue, unless there are other needs, because in the case of large amounts of data, Filebeat usage pressure Sensitive protocols send data to Logstash or Elasticsearch. If Logstash is busy processing data, it tells Filebeat to slow down its reads. Once the congestion is resolved, Filebeat will resume its original speed and continue sending data. Recommend a communication and learning group: 478030634 which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, principles of high concurrency, high performance, distributed and microservice architecture, JVM Performance optimization has become a necessary knowledge system for architects. You can also receive free learning resources and benefit a lot so far:3. Problems and Solutions
Problem : How to implement the multi-line merging function of logs?
Logs in system applications are generally printed in a specific format. Data belonging to the same log may be printed in multiple lines. Therefore, when using ELK to collect logs, you need to separate multiple lines of data belonging to the same log. Merge.
Solution: Use the multiline multi-line merge plug-in in Filebeat or Logstash to achieve
When using the multiline multi-line merge plug-in, you need to pay attention to different ELK deployments The architecture may also use multiline differently. If it is the first deployment architecture in this article, then multiline needs to be configured and used in Logstash. If it is the second deployment architecture, then multiline needs to be configured and used in Filebeat, and there is no need to configure it in Logstash. Configure multiline.
1. How to configure multiline in Filebeat:
negate: true
match: after
2. Multiline in Logstash Configuration method
(1) The value of the what attribute configured in Logstash is previous, which is equivalent to after in Filebeat. The value of the what attribute configured in Logstash is next. Equivalent to before in Filebeat. (2) pattern => "%{LOGLEVEL}s*]" The LOGLEVEL in "%{LOGLEVEL}s*]" is Logstash's pre-made regular matching pattern. There are many pre-made regular matching patterns. For details, please see: https://github .com/logstash-p...
Question: How to replace the time field of the log displayed in Kibana with the time in the log information?
By default, the time field we view in Kibana is inconsistent with the time in the log information. Because the default time field value is the current time when the log is collected, the time of this field needs to be changed. Replace with the time in the log message.Solution: Use the grok word segmentation plug-in and date time formatting plug-in to achieve
Configure the grok word segmentation plug-in and date time format in the filter of the Logstash configuration file plug-in, such as:If the log format to be matched is: "DEBUG[DefaultBeanDefinitionDocumentReader:106] Loading bean definitions", parse out the log The time field methods are: ① By introducing a written expression file, for example, the expression file is customer_patterns, and the content is:
CUSTOMER_TIME %{YEAR}%{MONTHNUM}%{MONTHDAY}s+ %{TIME}
Note: The content format is: [custom expression name] [regular expression]
Then it can be quoted in logstash like this:
② In the form of configuration items, the rule is: (?
Question: How to view data in Kibana by selecting different system log modules
Generally, the log data displayed in Kibana mixes data from different system modules , so how to select or filter to view only the log data of the specified system module?Solution: Add fields that identify different system modules or build ES indexes based on different system modules
1. Add fields that identify different system modules, and then Kibana can filter and query data from different modules based on this field Here we will explain the second deployment architecture. The configuration content in Filebeat is:
Identify different system module logs by adding: log_from field
The second deployment architecture is explained here, which is divided into two steps:
① The configuration content in Filebeat is:
through document_type Identify different system modules
Add the index attribute to the output, %{type} means to build an ES index based on different document_type values
This article mainly introduces the three deployment architectures of ELK real-time log analysis, as well as the problems that different architectures can solve. Among these three architectures, the second deployment method is the most popular and the most popular nowadays. The most commonly used deployment method, and finally introduces some problems and solutions of ELK in log analysis. In the end, ELK can not only be used for centralized query and management of distributed log data, but also can be used as a project application. As well as server resource monitoring and other scenarios.
The above is the detailed content of Distributed real-time log analysis solution ELK deployment architecture. For more information, please follow other related articles on the PHP Chinese website!