How to identify and cure MySQL replication slave lag

Home

Database

Mysql Tutorial

How to identify and cure MySQL replication slave lag_MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 01:16 PM

How to identify and cure MySQL replication slave lag Here on thePercona MySQL Supportteam, we often see issues where a customer is complaining about replication delays – and many times the problem ends up being tied to MySQL replication slave lag. This of course is nothing new for MySQL users and we’ve had a few posts here on the MySQL Performance Blog on this topic over the years(two particularly popular post in the past were: “Reasons for MySQL Replication Lag” and “Managing Slave Lag with MySQL Replication,” both by Percona CEO Peter Zaitsev).

In today’s post, however, I will share some new ways of identifying delays in replication – including possible causes of lagging slaves – and how to cure this problem.

How to identify Replication Delay
MySQL replication works with two threads, IO_THREAD & SQL_THREAD. IO_THREAD connects to a master, reads binary log events from the master as they come in and just copies them over to a local log file calledrelay log. On the other hand, SQL_THREAD reads events from a relay log stored locally on the replication slave (the file that was written by IO thread) and then applies them as fast as possible. Whenever replication delays, it’s important to discover first whether it’s delaying on slave IO_THREAD or slave SQL_THREAD.

Normally, I/O thread would not cause a huge replication delay as it is just reading the binary logs from the master. However, It depends on the network connectivity, network latency… how fast is that between the servers. The Slave I/O thread could be slow because of high bandwidth usage. Usually, when the slave IO_THREAD is able to read binary logs quickly enough it copies and piles up the relay logs on the slave – which is one indication that the slave IO_THREAD is not the culprit of slave lag.

On the other hand, when the slave SQL_THREAD is the source of replication delays it is probably because of queries coming from the replication stream are taking too long to execute on the slave. This is sometimes because of different hardware between master/slave, different schema indexes, workload. Moreover, the slave OLTP workload sometimes causes replication delays because of locking. For instance, if a long-running read against a MyISAM table blocks the SQL thread, or any transaction against an InnoDB table creates an IX lock and blocks DDL in the SQL thread. Also, take into account that slave is single threaded prior to MySQL 5.6, which would be another reason for delays on the slave SQL_THREAD.

Let me show you via master status/slave status example to identify either slave is lagging on slave IO_THREAD or slave SQL_THREAD.

mysql-master> SHOW MASTER STATUS;+------------------+--------------+------------------+------------------------------------------------------------------+| File | Position| Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set|+------------------+--------------+------------------+------------------------------------------------------------------+| mysql-bin.018196 | 15818564 || bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:1-2331947 |+------------------+--------------+------------------+------------------------------------------------------------------+mysql-slave> SHOW SLAVE STATUS/G*************************** 1. row ***************************Slave_IO_State: Queueing master event to the relay logMaster_Host: master.example.comMaster_User: replMaster_Port: 3306Connect_Retry: 60Master_Log_File: mysql-bin.018192Read_Master_Log_Pos: 10050480Relay_Log_File: mysql-relay-bin.001796Relay_Log_Pos: 157090Relay_Master_Log_File: mysql-bin.018192Slave_IO_Running: YesSlave_SQL_Running: YesReplicate_Do_DB:Replicate_Ignore_DB:Replicate_Do_Table:Replicate_Ignore_Table:Replicate_Wild_Do_Table:Replicate_Wild_Ignore_Table:Last_Errno: 0Last_Error:Skip_Counter: 0Exec_Master_Log_Pos: 5395871Relay_Log_Space: 10056139Until_Condition: NoneUntil_Log_File:Until_Log_Pos: 0Master_SSL_Allowed: NoMaster_SSL_CA_File:Master_SSL_CA_Path:Master_SSL_Cert:Master_SSL_Cipher:Master_SSL_Key:Seconds_Behind_Master: 230775Master_SSL_Verify_Server_Cert: NoLast_IO_Errno: 0Last_IO_Error:Last_SQL_Errno: 0Last_SQL_Error:Replicate_Ignore_Server_Ids:Master_Server_Id: 2Master_UUID: bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:2-973166Master_Info_File: /var/lib/mysql/i1/data/master.infoSQL_Delay: 0SQL_Remaining_Delay: NULLSlave_SQL_Running_State: Reading event from the relay logMaster_Retry_Count: 86400Master_Bind:Last_IO_Error_Timestamp:Last_SQL_Error_Timestamp:Master_SSL_Crl:Master_SSL_Crlpath:Retrieved_Gtid_Set: bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:2-973166Executed_Gtid_Set: bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:2-973166,ea75c885-c2c5-11e3-b8ee-5cf3fcfc9640:1-1370Auto_Position: 1

mysql-master>SHOWMASTERSTATUS;

+------------------+--------------+------------------+------------------------------------------------------------------+

+------------------+--------------+------------------+------------------------------------------------------------------+

+------------------+--------------+------------------+------------------------------------------------------------------+

mysql-slave>SHOWSLAVESTATUS/G

***************************1.row***************************

Slave_IO_State:Queueingmastereventtotherelaylog

Master_Host:master.example.com

Master_User:repl

Master_Port:3306

Connect_Retry:60

Master_Log_File:mysql-bin.018192

Read_Master_Log_Pos:10050480

Relay_Log_File:mysql-relay-bin.001796

Relay_Log_Pos:157090

Relay_Master_Log_File:mysql-bin.018192

Slave_IO_Running:Yes

Slave_SQL_Running:Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno:0Last_Error:Skip_Counter:0

Exec_Master_Log_Pos:5395871

Relay_Log_Space:10056139

Until_Condition:None

Until_Log_File:

Until_Log_Pos:0

Master_SSL_Allowed:No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master:230775

Master_SSL_Verify_Server_Cert:No

Last_IO_Errno:0

Last_IO_Error:

Last_SQL_Errno:0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id:2

Master_UUID:bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:2-973166

Master_Info_File:/var/lib/mysql/i1/data/master.info

SQL_Delay:0

SQL_Remaining_Delay:NULL

Slave_SQL_Running_State:Readingeventfromtherelaylog

Master_Retry_Count:86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:2-973166

Executed_Gtid_Set:bb11b389-d2a7-11e3-b82b-5cf3fcfc8f58:2-973166,

ea75c885-c2c5-11e3-b8ee-5cf3fcfc9640:1-1370

Auto_Position:1

This clearly suggests that the slave IO_THREAD is lagging and obviously because of that the slave SQL_THREAD is lagging, too, and it yields replication delays. As you can see the Master log file ismysql-bin.018196(File parameter from master status)and slave IO_THREAD is on mysql-bin.018192(Master_Log_File from slave status) which indicates slave IO_THREAD is reading from that file, while on master it’s writing onmysql-bin.018196, so the slave IO_THREAD is behind by4binlogs. Meanwhile, the slave SQL_THREAD is reading from same file i.e.mysql-bin.018192(Relay_Master_Log_File from slave status)This indicates that the slave SQL_THREAD is applying events fast enough, but it’s lagging too, which can be observed from the difference betweenRead_Master_Log_Pos& Exec_Master_Log_Posfrom show slave status output.

You can calculate slave SQL_THREAD lag fromRead_Master_Log_Pos–Exec_Master_Log_Posin general as long asMaster_Log_Fileparameter output from show slave status and Relay_Master_Log_Fileparameter from show slave status output are the same. This will give you rough idea how fast slave SQL_THREAD is applying events. As I mentioned above, the slave IO_THREAD is lagging as in this example then off course slave SQL_THREAD is behind too. You can read detailed description of show slave status output fieldshere.

Also, theSeconds_Behind_Masterparameter shows a huge delay in seconds. However, this can be misleading, because it only measures the difference between the timestamps of the relay log most recently executed, versus the relay log entry most recently downloaded by the IO_THREAD. If there are more binlogs on the master, the slave doesn’t figure them into the calculation of Seconds_behind_master. You can get a more accurate measure of slave lag usingpt-heartbeatfromPercona Toolkit.So, we learned how to check replication delays – either it’s slave IO_THREAD or slave SQL_THREAD. Now, let me provide some tips and suggestions for what exactly causing this delay.

Tips and Suggestions What Causing Replication Delay & Possible Fixes
Usually, the slave IO_THREAD is behind because of slow network between master/slave. Most of the time, enabling slave_compressed_protocolhelps to mitigate slave IO_THREAD lag. One other suggestion is to disable binary logging on slave as it’s IO intensive too unless you required it for point in time recovery.

To minimize slave SQL_THREAD lag, focus on query optimization. My recommendation is to enable the configuration optionlog_slow_slave_statementsso that the queries executed by slave that take more thanlong_query_timewill be logged to the slow log. To gather more information about query performance, I would also recommend setting the configuration optionlog_slow_verbosityto “full”.

This way we can see if there are queries executed by slave SQL_thread that are taking long time to complete. You can follow my previous post about how to enable slow query log for specific time period with mentioned optionshere. And as a reminder, log_slow_slave_statements as variable were first introduced inPercona Server5.1 which is now part of vanilla MySQL from version 5.6.11 In upstream version of MySQL Server log_slow_slave_statements were introduced as command line option. Details can be foundherewhile log_slow_verbosity is Percona Server specific feature.

One another reason for delay on slave SQL_THREAD if you use row based binlog format is that if your any database table missing primary key or unique key then it will scan all rows of the table for DML on slave and causes replication delays so make sure all your tables should have primary key or unique key. Check this bug report for detailshttp://bugs.mysql.com/bug.php?id=53375 You can use below query on slave to identify which of database tables missing primary or unique key.

mysql> SELECT t.table_schema,t.table_name,engineFROM information_schema.tables t INNER JOIN information_schema .columns con t.table_schema=c.table_schema and t.table_name=c.table_nameGROUP BY t.table_schema,t.table_nameHAVING sum(if(column_key in ('PRI','UNI'), 1,0)) =0;

mysql>SELECTt.table_schema,t.table_name,engine

FROMinformation_schema.tablestINNERJOINinformation_schema.columnsc

ont.table_schema=c.table_schemaandt.table_name=c.table_name

GROUPBYt.table_schema,t.table_name

HAVINGsum(if(column_keyin('PRI','UNI'),1,0))=0;

One improvement is made for this case in MySQL 5.6, where in memory hash is used slave_rows_search_algorithmscomes to the rescue.

Note that Seconds_Behind_Master is not updated while we read huge RBR event, So, “lagging” may be related to just that – we had not completed reading of the event. For example, in row based replication huge transactions may cause delay on slave side e.g. if you have 10 million rows table and you do “DELETE FROM table WHERE id 5Mrows will be sent to slave, each row separately which will be painfully slow. So, if you have to delete oldest rows time to time from huge table using Partitioning might be good alternative for this for some kind of workloads where instead using DELETE use DROP old partition may be good and only statement is replicated because it will be DDL operation.

To explain it better, let suppose you havepartition1holding rows of ID’s from 1to1000000,partition2– ID’s from1000001to2000000and so on so instead of deleting via statement “DELETE FROM table WHERE ID

pt-stalkis one of finest tool fromPercona Toolkitwhich collects diagnostics data when problems occur. You can setup pt-stalk as follows sowhenever there is a slave lag it can log diagnostic information which we can be later analyze to check to see what exactly causing the lag.

Here is how you can setup pt-stalk so that it captures diagnostic data when there is slave lag:

------- pt-plug.sh contents#!/bin/bashtrg_plugin() {mysqladmin $EXT_ARGV ping &> /dev/nullmysqld_alive=$?if [[ $mysqld_alive == 0 ]]thenseconds_behind_master=$(mysql $EXT_ARGV -e "show slave status" --vertical | grep Seconds_Behind_Master | awk '{print $2}')echo $seconds_behind_masterelseecho 1fi}# Uncomment below to test that trg_plugin function works as expected#trg_plugin--------- That's the pt-plug.sh file you would need to create and then use it as below with pt-stalk:$ /usr/bin/pt-stalk --function=/root/pt-plug.sh --variable=seconds_behind_master --threshold=300 --cycles=60 --notify-by-email=muhammad@example.com --log=/root/pt-stalk.log --pid=/root/pt-stalk.pid --daemonize

-------pt-plug.shcontents

#!/bin/bashtrg_plugin(){

mysqladmin$EXT_ARGVping&>/dev/null

mysqld_alive=$?

if[[$mysqld_alive==0]]

then

seconds_behind_master=$(mysql$EXT_ARGV-e"show slave status"--vertical|grepSeconds_Behind_Master|awk'{print $2}')

echo$seconds_behind_master

elseecho1fi}

# Uncomment below to test that trg_plugin function works as expected

#trg_plugin-------

--That'sthept-plug.shfileyouwouldneedtocreateandthenuseitasbelowwithpt-stalk:

$/usr/bin/pt-stalk--function=/root/pt-plug.sh--variable=seconds_behind_master--threshold=300--cycles=60--notify-by-email=muhammad@example.com--log=/root/pt-stalk.log--pid=/root/pt-stalk.pid--daemonize

You can adjust the threshold, currently its300seconds, combining that with –cycles, it means that if seconds_behind_master value is >=300for60seconds or more then pt-stalk will start capturing data. Adding–notify-by-emailoption will notify via email when pt-stalk captures data. You can adjust the pt-stalk thresholds accordingly so that’s how it triggers to collect diagnostic data during problem.

Conclusion
A lagging slave is a tricky problem but a common issue in MySQL replication. I’ve tried to cover most aspects of replication delays in this post. Please let me know in the comments section if you know of any other reasons for replication delay.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

MySQL: BLOB and other no-sql storage, what are the differences?May 13, 2025 am 12:14 AM

MySQL'sBLOBissuitableforstoringbinarydatawithinarelationaldatabase,whileNoSQLoptionslikeMongoDB,Redis,andCassandraofferflexible,scalablesolutionsforunstructureddata.BLOBissimplerbutcanslowdownperformancewithlargedata;NoSQLprovidesbetterscalabilityand

MySQL Add User: Syntax, Options, and Security Best PracticesMay 13, 2025 am 12:12 AM

ToaddauserinMySQL,use:CREATEUSER'username'@'host'IDENTIFIEDBY'password';Here'showtodoitsecurely:1)Choosethehostcarefullytocontrolaccess.2)SetresourcelimitswithoptionslikeMAX_QUERIES_PER_HOUR.3)Usestrong,uniquepasswords.4)EnforceSSL/TLSconnectionswith

MySQL: How to avoid String Data Types common mistakes?May 13, 2025 am 12:09 AM

ToavoidcommonmistakeswithstringdatatypesinMySQL,understandstringtypenuances,choosetherighttype,andmanageencodingandcollationsettingseffectively.1)UseCHARforfixed-lengthstrings,VARCHARforvariable-length,andTEXT/BLOBforlargerdata.2)Setcorrectcharacters

MySQL: String Data Types and ENUMs?May 13, 2025 am 12:05 AM

MySQloffersechar, Varchar, text, Anddenumforstringdata.usecharforfixed-Lengthstrings, VarcharerForvariable-Length, text forlarger text, AndenumforenforcingdataAntegritywithaetofvalues.

MySQL BLOB: how to optimize BLOBs requestsMay 13, 2025 am 12:03 AM

Optimizing MySQLBLOB requests can be done through the following strategies: 1. Reduce the frequency of BLOB query, use independent requests or delay loading; 2. Select the appropriate BLOB type (such as TINYBLOB); 3. Separate the BLOB data into separate tables; 4. Compress the BLOB data at the application layer; 5. Index the BLOB metadata. These methods can effectively improve performance by combining monitoring, caching and data sharding in actual applications.

Adding Users to MySQL: The Complete TutorialMay 12, 2025 am 12:14 AM

Mastering the method of adding MySQL users is crucial for database administrators and developers because it ensures the security and access control of the database. 1) Create a new user using the CREATEUSER command, 2) Assign permissions through the GRANT command, 3) Use FLUSHPRIVILEGES to ensure permissions take effect, 4) Regularly audit and clean user accounts to maintain performance and security.

Mastering MySQL String Data Types: VARCHAR vs. TEXT vs. CHARMay 12, 2025 am 12:12 AM

ChooseCHARforfixed-lengthdata,VARCHARforvariable-lengthdata,andTEXTforlargetextfields.1)CHARisefficientforconsistent-lengthdatalikecodes.2)VARCHARsuitsvariable-lengthdatalikenames,balancingflexibilityandperformance.3)TEXTisidealforlargetextslikeartic

MySQL: String Data Types and Indexing: Best PracticesMay 12, 2025 am 12:11 AM

Best practices for handling string data types and indexes in MySQL include: 1) Selecting the appropriate string type, such as CHAR for fixed length, VARCHAR for variable length, and TEXT for large text; 2) Be cautious in indexing, avoid over-indexing, and create indexes for common queries; 3) Use prefix indexes and full-text indexes to optimize long string searches; 4) Regularly monitor and optimize indexes to keep indexes small and efficient. Through these methods, we can balance read and write performance and improve database efficiency.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.