Home > Article > Operation and Maintenance > From monitoring to diagnostics: the power of data
Monitoring and diagnosis have always been two very important links in database operation and maintenance. In the traditional operation and maintenance model, monitoring and diagnosis are people-centered, so the collection of indicators and data must also focus on people. Expand.
Monitoring data needs to be viewed by people. Through human viewing, we can find abnormalities or areas worthy of vigilance in the monitoring data. DBAs of different levels can see different levels of risk from the data. Because it requires people to see it, there should not be too many indicators displayed, otherwise the monitoring personnel will be dazzled. In fact, the number of key indicators in the above picture is too many to monitor.
For monitoring that relies on people, a brief and intuitive indicator display is very necessary. For databases, only focusing on three or five key indicators can better achieve manual monitoring. One of my financial clients only pays attention to the active membership indicator for the core system. There is a monitor who keeps an eye on this indicator at any time. Once an abnormality occurs, he clicks on the relevant indicator for diagnostic analysis.
This is an indicator historical data monitoring page modified according to their needs. Once the number of active sessions indicator exceeds the standard, click to diagnose. On this page we provide a "Problem Analysis" tool.
The problem analysis tool can analyze the problems existing in the system (current problems or historical problems) according to the time window, while the waiting event analysis tool can analyze the problems from The perspective of waiting events helps DBA analyze possible performance problems in the system.
No matter what, the purpose of monitoring is to make the DBA's work easier, to serve people, and to be people-centered. Some friends may not agree with this and think that monitoring can also be automated, such as baseline alarms. In fact, the baseline alarm is similar. For example, the baseline alarm can tell you through SMS that the number of active sessions is abnormal. However, if the baseline alarm template sets too many indicators, the processing of alarm storms will be very troublesome. Inaccurate alarms will make the alarm function seem useless.
Traditional diagnosis is also people-centered. When there is a problem with the system, various information is searched for and analyzed in the system. This analysis relies heavily on the DBA's personal abilities. When a big problem occurs to users, they always hope that high-level experts can come to the scene to deal with it as soon as possible.
With the development of enterprise digitization, the cost of this human-centered monitoring and diagnosis model is getting higher and higher, and experts are less willing to sit on the front line. Therefore, saving labor costs and saving experts' time has become a very important requirement in database operation and maintenance. In fact, with the development of hardware, the costs of data collection, storage and calculation have become very low. Therefore, in modern database monitoring systems, it is no longer cost-prohibitive to collect and save more complete monitoring data.
If the data collected daily is rich enough, automated diagnosis and remote diagnosis will become possible. The data required for diagnostic work is already in the offline collected database. Most diagnostic tools no longer need to temporarily collect data from the database instance. Then when an abnormality occurs in the database, the automatic diagnostic tool can be used in the background without risk. Perform automated analysis.
The no risk mentioned here means that the automated diagnosis work itself will not bring any risk to the database instance. If it is necessary to temporarily collect some data from the database during automated diagnosis, then if this collection itself is risky, then it may be a move to add insult to injury on a database instance that is already faulty. We once made a tool for automatic diagnosis and analysis of shared pool fragmentation, which needed to analyze KGH data. This tool once brought down the database. Therefore, in terms of automated indicator collection and automated diagnosis, we will try our best to avoid the occurrence of such risks.
To achieve all this, the most important power behind it is data. Data is the first basis for monitoring and diagnosis automation. In fact, in automated database operation and maintenance, the indicator set and data collection itself contain rich operation and maintenance knowledge. Which indicators should be collected by a certain database and how to better collect database indicators without risk is very valuable operation and maintenance knowledge.
This year, we will open source the indicator sets of Oracle, Mysql, Postgresql, Dameng, Jincang and other databases in D-SMART. We also hope that everyone can join us to enrich and improve them together. This open source indicator set.
The above is the detailed content of From monitoring to diagnostics: the power of data. For more information, please follow other related articles on the PHP Chinese website!