Home  >  Article  >  Database  >  How to solve the problem of connection failure caused by MySQL using ReplicationConnection

How to solve the problem of connection failure caused by MySQL using ReplicationConnection

PHPz
PHPzforward
2023-05-26 13:10:321428browse

Introduction

MySQL database read-write separation is one of the common methods to improve service quality. For technical solutions, there are many mature open source frameworks or solutions, such as: sharding-jdbc, AbstractRoutingDatasource in spring, MySQL-Router, etc., and ReplicationConnection in mysql-jdbc can also support it.

This article will not do too much analysis on the technical selection of read-write separation. It only explores the reasons for connection failure when using druid as a data source and combining it with ReplicationConnection for read-write separation, and finds a simple and effective solution. solution.

Problem Background

Due to historical reasons, certain services have connection failure exceptions. The key error reports are as follows:

How to solve the problem of connection failure caused by MySQL using ReplicationConnection

It can be inferred from the logs This is because the connection has not interacted with the MySQL server for a long time, causing the server to close the connection, which is a typical connection failure situation.

Main configuration involved

jdbc configuration

##jdbc:mysql:replication://master_host:port,slave_host:port/database_name

druid configuration

testWhileIdle=true (i.e., idle connection check is enabled);

timeBetweenEvictionRunsMillis=6000L (i.e. , for the scenario of obtaining a connection, if a connection is idle for more than 1 minute, it will be checked. If the connection is invalid, it will be discarded and reacquired).

Attachment: The

processing logic in DruidDataSource.getConnectionDirect is as follows:

if (testWhileIdle) {
    final DruidConnectionHolder holder = poolableConnection.holder;
    long currentTimeMillis             = System.currentTimeMillis();
    long lastActiveTimeMillis          = holder.lastActiveTimeMillis;
    long lastExecTimeMillis            = holder.lastExecTimeMillis;
    long lastKeepTimeMillis            = holder.lastKeepTimeMillis;
    if (checkExecuteTime
            && lastExecTimeMillis != lastActiveTimeMillis) {
        lastActiveTimeMillis = lastExecTimeMillis;
    }
    if (lastKeepTimeMillis > lastActiveTimeMillis) {
        lastActiveTimeMillis = lastKeepTimeMillis;
    }
    long idleMillis    = currentTimeMillis - lastActiveTimeMillis;
    long timeBetweenEvictionRunsMillis = this.timeBetweenEvictionRunsMillis;
    if (timeBetweenEvictionRunsMillis <= 0) {
        timeBetweenEvictionRunsMillis = DEFAULT_TIME_BETWEEN_EVICTION_RUNS_MILLIS;
    }
    if (idleMillis >= timeBetweenEvictionRunsMillis
            || idleMillis < 0 // unexcepted branch
            ) {
        boolean validate = testConnectionInternal(poolableConnection.holder, poolableConnection.conn);
        if (!validate) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("skip not validate connection.");
            }
            discardConnection(poolableConnection.holder);
             continue;
        }
    }
}

mysql timeout parameter configuration wait_timeout=3600 (3600 seconds, that is : If a connection has not interacted with the server for more than an hour, the connection will be killed by the server). Obviously, based on the above configuration, according to conventional understanding, the problem "The last packet successfully received from server was xxx,xxx,xxx milliseconds ago" should not occur. (Of course, the possibility of manual intervention to kill the database connection was also ruled out at that time). When the "supposed" experience cannot explain the problem, it is often necessary to jump out of the shackles of superficial experience and get to the bottom of it. So, what is the real cause of this problem?

Essential reason

When druid is used to manage data sources and combined with the native ReplicationConnection in mysql-jdbc for read-write separation, there are actually two sets of connections, master and slaves, in the ReplicationConnection proxy object. Druid is in When doing connection detection, only the master connection can be detected. If a slave connection is not used for a long time, it will cause connection failure.

Cause analysis

In mysql-jdbc, the database driver’s connection processing process
Combined with the com.mysql.jdbc.Driver source code, it is not difficult to see that mysql-jdbc obtains The main process of the connection is as follows:

How to solve the problem of connection failure caused by MySQL using ReplicationConnection

For the jdbc-url configured starting with "jdbc:mysql:replication://", the connection obtained through mysql-jdbc, in fact It is a proxy object of ReplicationConnection. By default, the first host and port after "jdbc:mysql:replication://" correspond to the master connection, and the subsequent host and port correspond to the slaves connection. For multiple slave configurations, scenarios, the random policy is used for load balancing by default.

ReplicationConnection proxy object is generated using JDK dynamic proxy. The specific implementation of InvocationHandler is ReplicationConnectionProxy. The key code is as follows:

public static ReplicationConnection createProxyInstance(List<String> masterHostList, Properties masterProperties, List<String> slaveHostList,
            Properties slaveProperties) throws SQLException {
      ReplicationConnectionProxy connProxy = new ReplicationConnectionProxy(masterHostList, masterProperties, slaveHostList, slaveProperties);
      return (ReplicationConnection) java.lang.reflect.Proxy.newProxyInstance(ReplicationConnection.class.getClassLoader(), INTERFACES_TO_PROXY, connProxy);
 }

Important components of ReplicationConnectionProxy
About database connection Agent, the main components in ReplicationConnectionProxy are as follows:

How to solve the problem of connection failure caused by MySQL using ReplicationConnection

ReplicationConnectionProxy has two actual connection objects, masterConnection and slavesConnection. CurrentConnetion (current connection) can be switched to masterConnection or slavesConnection, switching method This can be achieved by setting readOnly.

In business logic, the core of realizing read-write separation is also here. To put it simply: when using ReplicationConnection to separate read-write, just do an aop that "sets the readOnly attribute of the connection".

Based on ReplicationConnectionProxy, the Connection proxy object obtained in the business logic, what is the main logic during database access?

ReplicationConnection proxy object processing process
For business logic, the obtained Connection instance is the ReplicationConnection proxy object, which completes the processing of database access through ReplicationConnectionProxy and ReplicationMySQLConnection. Among them, ReplicationConnectionProxy not only implements InvocationHandler, but also plays the role of connection management. The core logic is as follows:

How to solve the problem of connection failure caused by MySQL using ReplicationConnection

对于prepareStatement等常规逻辑,ConnectionMySQConnection获取到当前连接进行处理(普通的读写分离的处理的重点正是在此);此时,重点提及pingInternal方法,其处理方式也是获取当前连接,然后执行pingInternal逻辑。

对于ping()这个特殊逻辑,图中描述相对简单,但主体含义不变,即:对master连接和sleves连接都要进行ping()的处理。

图中,pingInternal流程和druid的MySQ连接检查有关,而ping的特殊处理,也正是解决问题的关键。

druid数据源对MySQ连接的检查

druid中对MySQL连接检查的默认实现类是MySqlValidConnectionChecker,其中核心逻辑如下:

public boolean isValidConnection(Connection conn, String validateQuery, int validationQueryTimeout) throws Exception {
    if (conn.isClosed()) {
        return false;
    }
    if (usePingMethod) {
        if (conn instanceof DruidPooledConnection) {
            conn = ((DruidPooledConnection) conn).getConnection();
        }
        if (conn instanceof ConnectionProxy) {
            conn = ((ConnectionProxy) conn).getRawObject();
        }
        if (clazz.isAssignableFrom(conn.getClass())) {
            if (validationQueryTimeout <= 0) {
                validationQueryTimeout = DEFAULT_VALIDATION_QUERY_TIMEOUT;
            }
            try {
                ping.invoke(conn, true, validationQueryTimeout * 1000);
            } catch (InvocationTargetException e) {
                Throwable cause = e.getCause();
                if (cause instanceof SQLException) {
                    throw (SQLException) cause;
                }
                throw e;
            }
            return true;
        }
    }
    String query = validateQuery;
    if (validateQuery == null || validateQuery.isEmpty()) {
        query = DEFAULT_VALIDATION_QUERY;
    }
    Statement stmt = null;
    ResultSet rs = null;
    try {
        stmt = conn.createStatement();
        if (validationQueryTimeout > 0) {
            stmt.setQueryTimeout(validationQueryTimeout);
        }
        rs = stmt.executeQuery(query);
        return true;
    } finally {
        JdbcUtils.close(rs);
        JdbcUtils.close(stmt);
    }
}

对应服务中使用的mysql-jdbc(5.1.45版),在未设置“druid.mysql.usePingMethod”系统属性的情况下,默认usePingMethod为true,如下:

public MySqlValidConnectionChecker(){
try {
        clazz = Utils.loadClass("com.mysql.jdbc.MySQLConnection");
        if (clazz == null) {
            clazz = Utils.loadClass("com.mysql.cj.jdbc.ConnectionImpl");
        }
        if (clazz != null) {
            ping = clazz.getMethod("pingInternal", boolean.class, int.class);
        }
        if (ping != null) {
            usePingMethod = true;
        }
    } catch (Exception e) {
        LOG.warn("Cannot resolve com.mysql.jdbc.Connection.ping method.  Will use &#39;SELECT 1&#39; instead.", e);
    }
    configFromProperties(System.getProperties());
}
@Override
public void configFromProperties(Properties properties) {
    String property = properties.getProperty("druid.mysql.usePingMethod");
    if ("true".equals(property)) {
        setUsePingMethod(true);
    } else if ("false".equals(property)) {
        setUsePingMethod(false);
    }
}

同时,可以看出MySqlValidConnectionChecker中的ping方法使用的是MySQLConnection中的pingInternal方法,而该方法,结合上面对ReplicationConnection的分析,当调用pingInternal时,只是对当前连接进行检验。执行检验连接的时机是通过DrduiDatasource获取连接时,此时未设置readOnly属性,检查的连接,其实只是ReplicationConnectionProxy中的master连接。

此外,如果通过“druid.mysql.usePingMethod”属性设置usePingMeghod为false,其实也会导致连接失效的问题,因为:当通过valideQuery(例如“select 1”)进行连接校验时,会走到ReplicationConnection中的普通查询逻辑,此时对应的连接依然是master连接。

题外一问:ping方法为什么使用“pingInternal”,而不是常规的ping?

原因:pingInternal预留了超时时间等控制参数。

解决方式

调整依赖版本

在服务中,使用的MySQL JDBC版本是5.1.45,并且使用的Druid版本是1.1.20。经过对其他高版本依赖的了解,依然存在该问题。

修改读写分离实现

修改的工作量主要在于数据源配置和aop调整,但需要一定的整体回归验证成本,鉴于涉及该问题的服务重要性一般,暂不做大调整。

拓展mysql-jdbc驱动

基于原有ReplicationConnection的功能,拓展pingInternal调整为普通的ping,集成原有Driver拓展新的Driver。方案可行,但修改成本不算小。

基于druid,拓展MySQL连接检查

为简单高效解决问题,选择拓展MySqlValidConnectionChecker,并在druid数据源中加上对应配置即可。拓展如下:

public class MySqlReplicationCompatibleValidConnectionChecker extends MySqlValidConnectionChecker {
    private static final Log LOG = LogFactory.getLog(MySqlValidConnectionChecker.class);
    /**
     * 
     */
    private static final long serialVersionUID = 1L;
    @Override
    public boolean isValidConnection(Connection conn, String validateQuery, int validationQueryTimeout) throws Exception {
        if (conn.isClosed()) {
            return false;
        }
        if (conn instanceof DruidPooledConnection) {
            conn = ((DruidPooledConnection) conn).getConnection();
        }
        if (conn instanceof ConnectionProxy) {
            conn = ((ConnectionProxy) conn).getRawObject();
        }
        if (conn instanceof ReplicationConnection) {
            try {
                ((ReplicationConnection) conn).ping();
                LOG.info("validate connection success: connection=" + conn.toString());
                return true;
            } catch (SQLException e) {
                LOG.error("validate connection error: connection=" + conn.toString(), e);
                throw e;
            }
        }
        return super.isValidConnection(conn, validateQuery, validationQueryTimeout);
    }
}

ReplicatoinConnection.ping()的实现逻辑中,会对所有master和slaves连接进行ping操作,最终每个ping操作都会调用到LoadBalancedConnectionProxy.doPing进行处理,而此处,可在数据库配置url中设置loadBalancePingTimeout属性设置超时时间。

The above is the detailed content of How to solve the problem of connection failure caused by MySQL using ReplicationConnection. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete
Previous article:How to modify mysql timeNext article:How to modify mysql time