Home >Database >Mysql Tutorial >Mysql series (12) Mysql monitoring operation
MySQL监控属于DB监控的模块之一,包括采集、展示、监控告警。本文主要介绍Mysql监控的主要指标和采集方法。
Mysql监控和Redis监控的逻辑类似,可参考文章《Redis监控》。
DBA前台添加Mysql监控时系统会调用自动调度平台接口将Mysql监控的加密账户密码和ip端口等信息发送至目标,同时发送采集Agent。
一、采集指标和命令
1、Mysql服务运行状态
约定所有Mysql服务都必须以ip1(内网ip)来绑定,每个机器只有一个ip1,可以有多个端口,即多个Mysql Server。采集程序读取ip端口信息文件来判断server是否存在。
sockParam=`ps aux | grep -P "mysqld.*--port=${port}" | grep -oP " --socket.*\.sock"` # 空则获取不到该服务器端口mysql socket配置,请检查mysql配置是否正确 MYSQL="/usr/local/mysql/bin/mysql -hlocalhost --port=${port} ${sockParam} -u${user} -p${password} " MYSQL_ADMIN="/usr/local/mysql/bin/mysqladmin -hlocalhost --port=${port} ${sockParam} -u${user} -p${password} " curStatus=`${MYSQL} -e"show global status"` # 空则是获取不到该服务器mysql状态,请检查mysql是否正常运行 if [ -z "${curStatus}" ] then portExists=0else echo "${curStatus}" >> ${curFile} portExists=1
2、连接数
${MYSQL_ADMIN} processlist -v | wc -l
3、线程数
grep 'Threads_connected' ${curFile} | awk '{print $2}'
4、慢查询数
grep 'Slow_queries' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值,等于最近1分钟的慢查询次数。上次数据保存在last.cache。
5、打开表数
grep 'Open_tables' ${curFile} | awk -F ' ' '{print $2}'
6、每秒执行select数
grep 'Com_select' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
7、每秒执行delete数
grep 'Com_delete' ${curFile} | grep -v 'multi' | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
8、每秒执行insert数
grep 'Com_insert' ${curFile} | grep -v 'select' | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
9、每秒执行update数
grep 'Com_update' ${curFile} | grep -v 'multi' | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
10、每秒钟执行replace数
grep 'Com_replace' ${curFile} | grep -v 'select' | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
11、每秒钟执行的 Innodb_rows_deleted
grep 'Innodb_rows_deleted' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
12、每秒钟执行的 Innodb_rows_inserted
grep 'Innodb_rows_inserted' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
13、每秒钟执行的 Innodb_rows_read
grep 'Innodb_rows_read' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
14、每秒钟执行的 Innodb_rows_updated
grep 'Innodb_rows_updated' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量。上次数据保存在last.cache。
15、每秒钟执行的 innodb rows total
expr ${innodbRowsDeletedPS} + ${innodbRowsInsertedPS} + ${innodbRowsReadPS} + ${innodbRowsUpdatedPS}
等于前面四个Innodb_rows_*执行次数的总和
16、每秒处理命令数 qps
expr ${mysqlSelectNumPS} + ${mysqlInsertNumPS} + ${mysqlUpdateNumPS} + ${mysqlDeleteNumPS} + ${mysqlReplaceNumPS}
等于前面五个mysql命令Com_*的数量总和
17、每秒接收字节数 KByte/s
grep 'Bytes_received' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量,除以1024得到单位KByte/s。上次数据保存在last.cache。
18、每秒发送字节数
grep 'Bytes_sent' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值除以时间差,等于最近1分钟的执行数量,除以1024得到单位KByte/s。上次数据保存在last.cache。
19、可立即获得锁的次数
grep 'Table_locks_immediate' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值,等于最近1分钟的可立即获得锁数量。上次数据保存在last.cache。
20、不可立即获得锁的次数
grep 'Table_locks_waited' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值,等于最近1分钟的不可立即获得锁数量。上次数据保存在last.cache。
21、一行锁定需等待时间
grep 'Innodb_row_lock_waits' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值,等于最近1分钟的一行锁定需等待时间。上次数据保存在last.cache。
22、 当前脏页数
grep 'Innodb_buffer_pool_pages_dirty' ${curFile} | awk -F ' ' '{print $2}'
23、要求清空的缓冲池页数
grep 'Innodb_buffer_pool_pages_flushed' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值,等于最近1分钟的要求清空的缓冲池页数。上次数据保存在last.cache。
24、Innodb 写入日志字节数 KByte
grep 'Innodb_os_log_written' ${curFile} | awk -F ' ' '{print $2}'
需要计算两次的慢查询次数得到差值,等于最近1分钟的写入日志字节数,除以1024得到KByte。上次数据保存在last.cache。
25、占用内存大小 MByte
pid=`ps aux | grep 'mysqld' | grep -Ev 'safe|grep' | awk '{print $2}' ` mem=`cat /proc/${pid}/status | grep 'VmRSS' | awk '{print $2}'` mysqlMem=`echo "scale=2;${mem} / 1024" | bc`
除以1024得到MByte
26、handler socket每秒处理数
curHsTableLock=`grep 'Hs_table_lock' ${curFile} | awk '{print $2}'` preHsTableLock=`grep 'Hs_table_lock' ${preFile} | awk '{print $2}'`if [ -n "${curHsTableLock}" ]then hsQPS=`echo "scale=0;(${curHsTableLock} - ${preHsTableLock}) / ${intervalTime}" | bc`else hsQPS=0fi
27、主从同步和状态
#主从信息 #是否为从服务器 slave_running=`grep 'Slave_running' ${curFile} | awk '{print $2}'`if [ "${slave_running}A" = "ONA" ]then slaveRunning=1 slaveStatus=`${MYSQL} -e'show slave status\G'` echo "${slaveStatus}" > ${slaveFile} slaveIoRunning=`grep 'Slave_IO_Running' ${slaveFile} | awk -F ':' '{print $2}'` slaveSqlRunning=`grep 'Slave_SQL_Running' ${slaveFile} | awk -F ':' '{print $2}'` if [ "${slaveIoRunning}A" == "NoA" -o "${slaveSqlRunning}A" == "NoA" ] then slaveRunning=3 fi secondsBehindMaster=`grep 'Seconds_Behind_Master' ${slaveFile} | awk -F ':' '{print $2}'` if [ "${secondsBehindMaster}A" = "NULLA" ] then secondsBehindMaster=8888 # 表示主从不同步 fi #是从库时 获取主库ip master=`grep 'Master_Host' ${slaveFile} | awk -F ':' '{print $2}'` masterPort=`grep 'Master_Port' ${slaveFile} | awk -F ':' '{print $2}'`else master="" masterPort="" slaveRunning=0 secondsBehindMaster=10000 # 不用检测fi
Note: Seconds_Behind_Master, this value is used as an indicator to judge the master-slave delay. So how does it get this value? At the same time, why is it questioned by many people? (This paragraph is quoted from http://blog.chinaunix.NET/uid-27038861-id-3686311.html)
Seconds_Behind_Master is by comparing the timestamp of the event executed by sql_thread and the timestamp of the event copied by io_thread( abbreviated as ts) are compared, and such a difference is obtained. We all know that the content in the relay-log is exactly the same as the bin-log of the main library. When recording the SQL statement, the ts at that time will be recorded, so the reference value for comparison comes from the binlog. In fact, there is no need for the master-slave to communicate with NTP. Synchronization is performed, which means that there is no need to ensure that the master and slave clocks are consistent. You will also find that the comparison actually occurs between io_thread and sql_thread, and io_thread is really related to the main library. Then, the problem arises. When the main library I/O load is heavy or the network is blocked, io_thread cannot be in time. Copy the binlog (no interruption, still copying), and sql_thread can always keep up with the io_thread script. At this time, the value of Seconds_Behind_Master is 0, which is what we think is no delay, but in fact it is not, you know. This is why everyone criticizes the use of this parameter to monitor whether the database delay is inaccurate, but this value is not always inaccurate. If the io_thread and master network are very good, then this value is also very accurate. of value.
Before, it was mentioned that the parameter Seconds_Behind_Master will have a negative value. We already know that this value is the difference between the latest ts of io_thread and the ts executed by sql_thread. The former is always greater than the latter, and the only The possibility is that an error has occurred in the ts of a certain event, which is smaller than the previous one. Then when this happens, it becomes possible for negative values to appear.
28. Detect and collect Agent heartbeat situation
The above is the content of Mysql series (12) Mysql monitoring operation. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!