Home >Operation and Maintenance >Safety >[Nightingale Monitoring] First time meeting Nightingale, still strong!

[Nightingale Monitoring] First time meeting Nightingale, still strong!

PHPz
PHPzforward
2023-06-09 10:01:211319browse

[Nightingale Monitoring] First time meeting Nightingale, still strong!

Preface

Observability is a headache for most small and medium-sized companies, which mainly manifests in the following aspects :

  1. Different open source software is required to assemble to achieve different functions, such as using Skywalking to implement link monitoring, using ELK to implement log collection and monitoring, and using Grafana Prometheus to implement indicator monitoring.
  2. Behind each open source software is an independent system. They were independent of each other before (Grafana Family Bucket has been combined).
  3. Data islands, links, logs, and indicators are all separate, and no connection is established. The solutions currently on the market are either commercial products or self-developed.

The protagonist of this article is actually not unified. At the current stage, different open source components are still used to implement different functions. However, N9e can view them on the same main panel, but the connection between the data Still hasn't happened.

Then why do we still need to study N9e?

Because it is developing in this direction.

As mentioned above, Grafana is already doing this. Based on the Grafana Loki Tempo Prometheus combination, monitoring, indicators, and links can be linked. What is the difference between N9e and Grafana?

In Mr. Qin’s words: Grafana is better at managing monitoring panels, and N9e is better at managing alarm rules.

N9e can send different alarm rules to different business groups and groups to avoid generating a large number of alarm messages in one group, which will lead to the story of the crying wolf over time.

Having said so much, what does N9e look like?

The following is a system I have deployed.

[Nightingale Monitoring] First time meeting Nightingale, still strong!

As you can see, on this panel, we can implement:

  • Alarm management
  • Time series indicator query
  • Log analysis
  • Link tracking
  • Alarm self-healing
  • Personnel management
  • ....

In this way, you don’t need to switch back and forth between several applications, which is fast.

System Architecture

If you don’t understand the architecture, it will be in vain if you don’t understand the architecture.

Now let’s take a look at what the architecture of N9e looks like. Only by clarifying how N9e works from the architectural logic will be of great benefit to both deployment and maintenance.

N9e mainly has a central convergence deployment solution and an edge sinking hybrid deployment solution, which will be explained below.

Central converged deployment solution

First picture:

[Nightingale Monitoring] First time meeting Nightingale, still strong!

This solution is to establish an N9e cluster , the monitoring data of other regions are sent to this cluster, which requires a good network connection between the central cluster and other regions.

For the central cluster, it mainly includes the following components:

  • MySQL: used to store configuration information and alarm events.
  • Redis: used to store JWT Token, machine meta information and other data.
  • TSDB: Time series database, which stores monitoring indicators.
  • N9e: core service, handles web requests and provides alarm engine
  • LB: Provides load function for multiple N9e.

For other Regions, you only need to deploy Categraf, which will push local monitoring data to the central cluster.

This architecture is characterized by simplicity and relatively low maintenance costs. The premise is that the network links between computer rooms must be relatively good. If the network is not good, the following solution must be used.

Edge sinking hybrid deployment solution

[Nightingale Monitoring] First time meeting Nightingale, still strong!

This architecture is a supplement to the central deployment solution, mainly for the network Bad situation:

  1. Move the time series database TSDB, forwarding gateway, and alarm engine to a specific Region, and let the Region itself handle it. However, the Region still needs to establish a heartbeat connection with the central cluster, and users can still view the monitoring information of other Regions through the monitoring panel of the central cluster.
  2. If you already have Prometheus, you can also directly connect Prometheus as a data source.

In the edge computer room, when deploying the timing library, alarm engine, and forwarding gateway, please note that the alarm engine needs to rely on the database because alarm rules need to be synchronized, and the forwarding gateway also needs to rely on the database because it requires To register objects in the database, you need to open the relevant network.

!! # PS: For this solution, the network itself is not good, and the network needs to be opened. Maybe It will still be affected by network problems.

Single-machine deployment

Why should we choose stand-alone deployment here?

Actually, I want to deploy each component next to each other, which will be helpful for understanding the entire N9e operating mode.

!! Tips: I am using Ubuntu 22.04.1 system

Install MySQL

##!! Tips : For the sake of speed, I installed Mariadb

# 更新镜像源
$ sudo apt-get update
# 更新软件
$ sudo apt-get upgrade
# 安装Mariabd
$ sudo apt-get install mariadb-server-10.6
It will start automatically after the installation is completed. Then set a user password for the database.

# 连接数据库
$ sudo mysql
# 设置权限和密码
> GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY '1234';
>flush privileges;

Install Redis
# 更新镜像源
$ sudo apt-get update
# 更新软件
$ sudo apt-get upgrade
# 安装Redis
$ sudo apt install redis-server

It will start automatically by default.

Installing TSDB

There are many options for TSDB for N9e:

    Prometheus
  • M3DB
  • VictoriaMetrics
  • InfluxDB
  • Thanos
Here I choose VictoriaMetrics.

# 下载二进制包
$ wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.90.0/victoria-metrics-linux-amd64-v1.90.0.tar.gz
# 解压
$ tar xf victoria-metrics-linux-amd64-v1.90.0.tar.gz
# 启动
$ nohup ./victoria-metrics-prod &>victoria.log &

Observe whether 8428 is started.

Install N9e
# 下载最新版本的二进制包
$ wget https://github.com/ccfos/nightingale/releases/download/v6.0.0-ga.3/n9e-v6.0.0-ga.3-linux-amd64.tar.gz
# 解压
$ mkdir n9e
$ tar xf n9e-v6.0.0-ga.3-linux-amd64.tar.gz -C n9e/
# 检验目录如下
$ ll
total 35332
drwxrwxr-x7 jokerbai jokerbai 40964月 12 14:05 ./
drwxr-xr-x4 jokerbai jokerbai 40964月 12 14:05 ../
drwxrwxr-x3 jokerbai jokerbai 40964月 12 14:05 cli/
drwxrwxr-x 10 jokerbai jokerbai 40964月 12 14:05 docker/
drwxrwxr-x4 jokerbai jokerbai 40964月 12 14:09 etc/
drwxrwxr-x 20 jokerbai jokerbai 40964月 12 14:05 integrations/
-rwxr-xr-x1 jokerbai jokerbai 252805124月6 19:05 n9e*
-rwxr-xr-x1 jokerbai jokerbai 108380164月6 19:05 n9e-cli*
-rw-r--r--1 jokerbai jokerbai297844月6 19:04 n9e.sql
drwxrwxr-x6 jokerbai jokerbai 40964月 12 14:05 pub/

Then import the N9e database.

# 导入数据库
$ mysql -uroot -p <n9e.sql

Modify the N9e configuration file in the etc/config.toml file in the current directory.

[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://127.0.0.1:8428/api/v1/write"

Then start the N9e service.

# 启动服务
$ nohup ./n9e &>n9e.log &
# 检测17000端口是否启动
$ ss -ntl | grep 17000
LISTEN 04096 *:17000*:*

Enter http://127.0.0.1:17000 in the browser, then enter the username root and password root.2020 to log in to the system.

[Nightingale Monitoring] First time meeting Nightingale, still strong!

Installing Categraf

Categraf is a monitoring and collection Agent that will push the collected information to TSDB.

# 下载
$ wget https://download.flashcat.cloud/categraf-v0.2.38-linux-amd64.tar.gz
# 解压
$ tar xf categraf-v0.2.38-linux-amd64.tar.gz
# 进入目录
$ cd categraf-v0.2.38-linux-amd64/

Modify the configuration file. In conf/config.toml, the modified parts are as follows:

[[writers]]
url = "http://127.0.0.1:17000/prometheus/v1/write"

[heartbeat]
enable = true

Then start Categraf.

$ nohup ./categraf &>categraf.log &

Then you can see the basic information on the main interface.

[Nightingale Monitoring] First time meeting Nightingale, still strong!

Add data source

Now if you go to view the time series data indicators, you cannot query them. Because no data source has been added.

[Nightingale Monitoring] First time meeting Nightingale, still strong!

Add a data source in System Configuration->Data Source, as follows:

[Nightingale Monitoring] First time meeting Nightingale, still strong!

##Then you can see the corresponding indicator data.

[Nightingale Monitoring] First time meeting Nightingale, still strong!

You can also view the monitoring data of the host through the built-in dashboard, as follows:

[Nightingale Monitoring] First time meeting Nightingale, still strong!

Summary

This article mainly provides a preliminary impression of Nightingale, briefly introduces its overall architecture, and then takes everyone from I have installed it from 0 to 1 to give everyone a clear understanding of the components of Nightingale.

At present, Nightingale has been updated to the V6 version. This version has many new functional attempts, such as access to ELK, access to Jaeger, etc. This series will continue to be updated in the future.

The above is the detailed content of [Nightingale Monitoring] First time meeting Nightingale, still strong!. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete