Home  >  Article  >  Backend Development  >  Distributed real-time log analysis solution ELK deployment architecture

Distributed real-time log analysis solution ELK deployment architecture

不言
不言Original
2018-05-05 15:10:124720browse

ELK has become the most popular centralized log solution at present. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the collection, storage, display and other one-stop solutions of real-time logs. This article mainly introduces you to the distributed real-time log analysis solution ELK deployment architecture. Friends in need can take a look at

Course Recommendation →: 《 Elasticsearch full-text search practice" (practical video)

From the course "Ten million-level data concurrency solution (theory + practice)"

1 , Overview

ELK has become the most popular centralized log solution at present. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the collection, storage and display of real-time logs in one stop. solution. This article will introduce the common architecture of ELK and solve related problems.

  • Filebeat: Filebeat is a lightweight data collection engine that takes up very few service resources. It is a new member of the ELK family and can replace Logstash as the log collection on the application server. The engine supports outputting collected data to queues such as Kafka and Redis.

  • Logstash: Data collection engine, which is more heavyweight than Filebeat, but it integrates a large number of plug-ins and supports the collection of rich data sources. The collected data can be filtered and analyzed. Format log format.

  • Elasticsearch: Distributed data search engine, implemented based on Apache
    Lucene, can be clustered, and provides centralized storage and analysis of data, as well as powerful data search and aggregation functions.

  • Kibana: A data visualization platform. This web platform allows you to view relevant data in Elasticsearch in real time and provides rich chart statistics.

2. ELK common deployment architecture

2.1. Logstash as a log collector

This architecture is relatively primitive Deployment architecture: deploy a Logstash component on each application server as a log collector, then filter, analyze, and format the data collected by Logstash and send it to Elasticsearch storage, and finally use Kibana for visual display. This architecture is insufficient The key point is: Logstash consumes more server resources, so it will increase the load pressure on the application server.

Distributed real-time log analysis solution ELK deployment architecture

2.2. Filebeat as a log collector

The only difference between this architecture and the first architecture Yes: The application-side log collector is replaced by Filebeat. Filebeat is lightweight and takes up less server resources, so Filebeat is used as the application server-side log collector. Generally, Filebeat is used together with Logstash. This deployment method is also the most commonly used architecture at present. .

Distributed real-time log analysis solution ELK deployment architecture

2.3. Deployment architecture that introduces cache queue

This architecture is based on the second architecture The Kafka message queue (can also be other message queues) is introduced, the data collected by Filebeat is sent to Kafka, and then the data in Kafka is read through Logstasth. This architecture is mainly used to solve log collection solutions under large data volumes. , using cache queues is mainly to solve data security and balance the load pressure of Logstash and Elasticsearch.

Distributed real-time log analysis solution ELK deployment architecture

##2.4. Summary of the above three architectures

The first deployment architecture has resource occupancy issues , is now rarely used, and the second deployment architecture is currently the most used. As for the third deployment architecture, I personally feel that there is no need to introduce a message queue, unless there are other needs, because in the case of large amounts of data, Filebeat usage pressure Sensitive protocols send data to Logstash or Elasticsearch. If Logstash is busy processing data, it tells Filebeat to slow down its reads. Once the congestion is resolved, Filebeat will resume its original speed and continue sending data.

Recommend a communication and learning group: 478030634 which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, principles of high concurrency, high performance, distributed and microservice architecture, JVM Performance optimization has become a necessary knowledge system for architects. You can also receive free learning resources and benefit a lot so far:

Distributed real-time log analysis solution ELK deployment architecture

3. Problems and Solutions

Problem : How to implement the multi-line merging function of logs?

Logs in system applications are generally printed in a specific format. Data belonging to the same log may be printed in multiple lines. Therefore, when using ELK to collect logs, you need to separate multiple lines of data belonging to the same log. Merge.

Solution: Use the multiline multi-line merge plug-in in Filebeat or Logstash to achieve

When using the multiline multi-line merge plug-in, you need to pay attention to different ELK deployments The architecture may also use multiline differently. If it is the first deployment architecture in this article, then multiline needs to be configured and used in Logstash. If it is the second deployment architecture, then multiline needs to be configured and used in Filebeat, and there is no need to configure it in Logstash. Configure multiline.

1. How to configure multiline in Filebeat:

Distributed real-time log analysis solution ELK deployment architecture

  • ##pattern : Regular expression

  • negate: The default is false, which means that lines matching pattern are merged into the previous line; true means that lines that do not match pattern are merged into the previous line.

  • match: after means merging to the end of the previous line, before means merging to the beginning of the previous line

For example:

pattern: '['

negate: true
match: after

This configuration means to merge the lines that do not match the pattern pattern to the end of the previous line

2. Multiline in Logstash Configuration method

Distributed real-time log analysis solution ELK deployment architecture

(1) The value of the what attribute configured in Logstash is previous, which is equivalent to after in Filebeat. The value of the what attribute configured in Logstash is next. Equivalent to before in Filebeat.

(2) pattern => "%{LOGLEVEL}s*]" The LOGLEVEL in "%{LOGLEVEL}s*]" is Logstash's pre-made regular matching pattern. There are many pre-made regular matching patterns. For details, please see: https://github .com/logstash-p...

Question: How to replace the time field of the log displayed in Kibana with the time in the log information?

By default, the time field we view in Kibana is inconsistent with the time in the log information. Because the default time field value is the current time when the log is collected, the time of this field needs to be changed. Replace with the time in the log message.

Solution: Use the grok word segmentation plug-in and date time formatting plug-in to achieve

Configure the grok word segmentation plug-in and date time format in the filter of the Logstash configuration file plug-in, such as:

Distributed real-time log analysis solution ELK deployment architecture

If the log format to be matched is: "DEBUG[DefaultBeanDefinitionDocumentReader:106] Loading bean definitions", parse out the log The time field methods are:

① By introducing a written expression file, for example, the expression file is customer_patterns, and the content is:

CUSTOMER_TIME %{YEAR}%{MONTHNUM}%{MONTHDAY}s+ %{TIME}
Note: The content format is: [custom expression name] [regular expression]
Then it can be quoted in logstash like this:

Distributed real-time log analysis solution ELK deployment architecture

② In the form of configuration items, the rule is: (?Regular matching rule), such as:

Distributed real-time log analysis solution ELK deployment architecture

Question: How to view data in Kibana by selecting different system log modules

Generally, the log data displayed in Kibana mixes data from different system modules , so how to select or filter to view only the log data of the specified system module?

Solution: Add fields that identify different system modules or build ES indexes based on different system modules

1. Add fields that identify different system modules, and then Kibana can filter and query data from different modules based on this field

Here we will explain the second deployment architecture. The configuration content in Filebeat is:

Distributed real-time log analysis solution ELK deployment architectureIdentify different system module logs by adding: log_from field

2. Configure the corresponding ES index according to different system modules, and then create the corresponding index pattern matching in Kibana, and you can pass the index on the page Mode drop-down box selects different system module data.

The second deployment architecture is explained here, which is divided into two steps:
① The configuration content in Filebeat is:

Distributed real-time log analysis solution ELK deployment architecture through document_type Identify different system modules

② Modify the output configuration content in Logstash as follows:

Add the index attribute to the output, %{type} means to build an ES index based on different document_type values

4. Summary

This article mainly introduces the three deployment architectures of ELK real-time log analysis, as well as the problems that different architectures can solve. Among these three architectures, the second deployment method is the most popular and the most popular nowadays. The most commonly used deployment method, and finally introduces some problems and solutions of ELK in log analysis. In the end, ELK can not only be used for centralized query and management of distributed log data, but also can be used as a project application. As well as server resource monitoring and other scenarios.

The above is the detailed content of Distributed real-time log analysis solution ELK deployment architecture. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn