Mysql Connector/J 源码分析（Failover）

2023-10-27

文章目录

前言
一、什么是Failover？
二、Failover的主要结构
三、异常处理
- 3.1 构造连接阶段
  - 小结
- 3.2 使用连接阶段
  - 小结
四、四元素判定
- 小结
五、专有选择项
六、官网的态度
总结

前言

本文讨论Connector/J 的failover模块。本文将以两个区分（区分构造阶段和使用阶段；区分通讯异常和数据异常）来分析failover的使用价值。

本次分析的版本为5.1.46。若通过maven下载，可添加以下依赖：

<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.46</version>
</dependency>

我们获取连接的例子如下：

    Connection conn = null;
    URL ＝“jdbc:mysql://ip1:port1,ip2:port2,ip3:port3/dbname”；
    try{
        // 注册 JDBC 驱动
        Class.forName("com.mysql.jdbc.Driver");

        // 打开链接
        conn = DriverManager.getConnection(DB_URL,USER,PASS);
    ....

一、什么是Failover？

官方用两段文字对Failover进行描述：

MySQL Connector/J supports server failover. A failover happens when connection-related errors occur for an underlying, active connection. The connection errors are, by default, propagated to the client, which has to handle them by, for example, recreating the working objects (Statement, ResultSet, etc.) and restarting the processes. Sometimes, the driver might eventually fall back to the original host automatically before the client application continues to run, in which case the host switch is transparent and the client application will not even notice it.

A connection using failover support works just like a standard connection: the client does not experience any disruptions in the failover process. This means the client can rely on the same connection instance even if two successive statements might be executed on two different physical hosts. However, this does not mean the client does not have to deal with the exception that triggered the server switch.

大意是说，通常情况下连接发生异常，调用方就得放充当前连接，获取新连接，重新一遍之前的操作。而有了failover特性后，它会在底层的连接出状况时候产生作用，它不会让调用方当前的操作崩掉而重新再来。在failover的作用下，调用方只需保持着一个连接，尽管前后两条SQL命令有可能在两台不同的设备上执行。

通过官方的描述，看起来调用方很轻松，拿看一个连接就可走天下了。但细想一下，如果当前的操作是带事务的，前后两条命令在不同设备上执行，事务还会生效？官方的最后一句是说有了failover并不意味着的调用方可以高枕无忧，那么到底调用方在什么情况下需要做什么处理呢？后续章节将一一解疑。

二、Failover的主要结构

我们以UML类图展示其内部结构：

主要组件功能如下：

- MysqlIO：负责与Mysql服务器建立tcp链接。
- ConnectionImpl、JDBC4Connection：通过MysqlIO控制与Mysql服务器间的连接，实现数据操作的接口方法。并设定和记录各种连接时间。
- MultiHostMySQLConnection、JDBC4ＭultiHostMySQLConnection：它们之间是继承关系。通过代理对象FailoverConnectionProxy获取JDBC4Connection对象，并调用其对应的接口方法（从JDBC4MySQLConnection到java.sql.Connection声明的方法）。
- MultiHostConnectionProxy：作为各种动态类的父类，实现了各种动态类的公共方法，最常见的就是返回当前连接对象给到MultiHostMySQLConnection及其子类。它还是InvocationHandler接口的直接实现类，它重载了invoke方法，并声明了由其子类实现的虚方法invokeMore。invoke方法的实现使用了模板方法这种设计模式。通过invoke方法和子类的invokeMore方法，一起实现了代理模式，即在被代理方法执行之前和之后都添加了一些行为。
- FailoverConnectionProxy作为MultiHostConnectionProxy的子类重载了invokeMore方法。当发生异常时，在被代理方法执行前或者后，通过failover方法更换当前连接对象。

三、异常处理

从官网的描述可以理解到，failover就是要解决出现底层连接异常时造成调用方当前操作崩掉的问题。我们从连接的构造阶段和连接的使用阶段进行探讨，并且每个阶段我们都尝试分析通讯异常和数据异常发生后分别会有什么不同的处理。

3.1 构造连接阶段

从调用者发起DriverManager.getConnection命令开始到FailoverConnectionProxy#createProxyInstance方法，虽然期间所调用的命令会抛出异常，然后都已经及时处理，没有继续抛向调用者。这条调用链如下：

接下来我们看看FailoverConnectionProxy#createProxyInstance方法：

public static Connection createProxyInstance(List<String> hosts, Properties props) throws SQLException {
        FailoverConnectionProxy connProxy = new FailoverConnectionProxy(hosts, props);

        return (Connection) java.lang.reflect.Proxy.newProxyInstance(Connection.class.getClassLoader(), INTERFACES_TO_PROXY, connProxy);
    }

方法里只有两条命令，第一条命令构造FailoverConnectionProxy对象。根据方法声明，我们知道它会抛出SQLException。第二条命令构造动态代理，它有可能抛出IllegalArgumentException，这异常是RuntimeException的子类。因此我们只需要着重观察FailoverConnectionProxy对象的构造过程。

@Override
    synchronized void pickNewConnection() throws SQLException {
        if (this.isClosed && this.closedExplicitly) {
            return;
        }

        if (!isConnected() || readyToFallBackToPrimaryHost()) {
            try {
                connectTo(this.primaryHostIndex);
            } catch (SQLException e) {
                resetAutoFallBackCounters();
                failOver(this.primaryHostIndex);
            }
        } else {
            failOver();
        }
    }

从代码的结构以及方法的命名可大概猜出此方法要做的事情。因此当前处于构造阶段，所以肯定满足!isConnected()方法的判定。接下来就会尝试连接url里第一组的ip:port ，如果期间出现异常（在《Mysql Connector/J 源码分析（普通Connection）》一文里已经知道，通讯异常和数据异常都会统一被封装成SQLException），该异常将会被捕获，然后执行failOver方法。

private synchronized void failOver(int failedHostIdx) throws SQLException {
        
    ....

        do {
            try {
                firstConnOrPassedByPrimaryHost = firstConnOrPassedByPrimaryHost || isPrimaryHostIndex(nextHostIndex);

                connectTo(nextHostIndex);

                if (firstConnOrPassedByPrimaryHost && connectedToSecondaryHost()) {
                    resetAutoFallBackCounters();
                }
                gotConnection = true;

            } catch (SQLException e) {
                lastExceptionCaught = e;

                if (shouldExceptionTriggerConnectionSwitch(e)) {
                    
                    ....

                    nextHostIndex = newNextHostIndex;

                } else {
                    throw e;
                }
            }
        } while (attempts < this.retriesAllDown && !gotConnection);

        if (!gotConnection) {
            throw lastExceptionCaught;
        }
    }

方法里有一个大的do while循环体。循环条件是遍数未够并且还没建立底层连接。离开循环体后，如果还没建立底层连接就抛出最近一次的异常。

在循环体里有捕获异常的行为，然后通过shouldExceptionTriggerConnectionSwitch判断异常的性质，如果它不符合判定就直接抛出异常。我们继续看看shouldExceptionTriggerConnectionSwitch是如何判定异常的：

@Override
    boolean shouldExceptionTriggerConnectionSwitch(Throwable t) {
        if (!(t instanceof SQLException)) {
            return false;
        }

        String sqlState = ((SQLException) t).getSQLState();
        if (sqlState != null) {
            if (sqlState.startsWith("08")) {
                // connection error
                return true;
            }
        }

        // Always handle CommunicationsException
        if (t instanceof CommunicationsException) {
            return true;
        }

        return false;
    }

从代码里不难看出，通讯类异常符合方法的判定。那么我们可以判断出failOver方法抛出异常的场景：

非通讯类异常，直接抛出异常
通讯类异常，轮循的遍数已足够并且仍未建立连接情况下将被抛出。

小结

因此，我们知道在构造动态代理连接的时候，有可能发生数据异常和通讯异常。数据异常能够比较早地被调用者感知到，而通讯异常不一定会被调用者感知到；如果调用者感知到通讯异常时，说明url里配置的ip:port都未能够成功建立连接。总的来说，调用者能够感知到异常。

一个有趣的发现：如果我们设置错误的用户名与密码，通过调用链我们可知道，程序来到pickNewConnection的时候，会调用connectTo方法尝试建立连接。因为用户名与密码是错误的，所以底层会抛出SQLException。该异常被捕获后进入failover方法，然后再次connectTo方法，这次抛出来的异常才直接抛向调用者。所以这种情况会尝试两次建立连接。在在《Mysql Connector/J 源码分析（普通Connection）》中我们知道每次建立连接都耗时耗资源，所以在pickNewConnection方法里如果判断到异常属于数据异常时就将异常往调用者抛，这样可能会更好。

3.2 使用连接阶段

failover使用了动态代理技术，即执行任何一条指定接口的方法都会进入InvocationHandler接口实现类的invoke方法。在此处FailoverConnectionProxy的父类MultiHostConnectionProxy实现该接口并重载invoke方法，而方法体内最终会调用FailoverConnectionProxy的invokeMore方法:

@Override
    public synchronized Object invokeMore(Object proxy, Method method, Object[] args) throws Throwable {
        String methodName = method.getName();

        ....

        Object result = null;

        try {
            result = method.invoke(this.thisAsConnection, args);
            result = proxyIfReturnTypeIsJdbcInterface(method.getReturnType(), result);
        } catch (InvocationTargetException e) {
            dealWithInvocationException(e);
        }

        ....

        return result;
    }

（1）如果执行conn.setAutoCommit(false)命令时，出现了通讯异常或者数据异常，反射机制会抛出InvocationTargetException，在此处会进入MultiHostConnectionProxy#dealWithInvocationException方法：

void dealWithInvocationException(InvocationTargetException e) throws SQLException, Throwable, InvocationTargetException {
        Throwable t = e.getTargetException();

        if (t != null) {
            if (this.lastExceptionDealtWith != t && shouldExceptionTriggerConnectionSwitch(t)) {
                invalidateCurrentConnection();
                pickNewConnection();
                this.lastExceptionDealtWith = t;
            }
            throw t;
        }
        throw e;
    }

在该方法里我们又看到了shouldExceptionTriggerConnectionSwitch方法的调用。如果异常源于通讯异常，就会调用pickNewConnection方法，随后进入failover方法；如果异常源于数据异常，则直接抛出。这些方法前文已经介绍过，这里需要补充的是，如果此时failover抛出异常，那么invokeMore方法没有进一步捕获异常的操作，而父类MultiHostConnectionProxy#invoke方法里虽有捕获，但它做的事情是要不要对异常进行封装，最后依然是继续往上抛，所以异常会直接到调用方那：

public synchronized Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        String methodName = method.getName();

        ....

        try {
            return invokeMore(proxy, method, args);
        } catch (InvocationTargetException e) {
            throw e.getCause() != null ? e.getCause() : e;
        } catch (Exception e) {
            // Check if the captured exception must be wrapped by an unchecked exception.
            Class<?>[] declaredException = method.getExceptionTypes();
            for (Class<?> declEx : declaredException) {
                if (declEx.isAssignableFrom(e.getClass())) {
                    throw e;
                }
            }
            throw new IllegalStateException(e.getMessage(), e);
        }
    }

（2）如果执行conn.createStatement()，会进入proxyIfReturnTypeIsJdbcInterface方法。该方法对于类路径以java.sql或者javax.sql为前缀的类使用FailoverJdbcInterfaceProxy（它是FailoverConnectionProxy的内部类）进行封装然后生成动态代理。所以，Statement的实现类就会被动态代理。FailoverJdbcInterfaceProxy的父类是JdbcInterfaceProxy（它是ＭulitHostConnectionProxy的内部类），JdbcInterfaceProxy实现了InvocationHandler接口和重载了invoke方法。使用JdbcInterfaceProxy进行封装有一种使用适匹器的味道，读者自行体会吧。但最终的目的就是如果这些类的实例在运行时产生异常也能够进入ＭulitHostConnectionProxy#dealWithInvocationException的方法进行判定，最终抛出异常。

小结

在使用过程中，调用者能够感知到异常的发生。而保证这一点的核心是MultiHostConnectionProxy#dealWithInvocationException方法，它一定会将异常往抛。通讯异常与数据异常唯一的差别在于，通讯异常会在更换底层连接后再抛，而数据异常直接抛出异常。

四、四元素判定

在调用Method#invoke前，有可能会进行一次更换底层连接。然而，这需要满足4个元素。

@Override
    public synchronized Object invokeMore(Object proxy, Method method, Object[] args) throws Throwable {
        String methodName = method.getName();

        ....

        if (this.isClosed && !allowedOnClosedConnection(method)) {
            if (this.autoReconnect && !this.closedExplicitly) {
                this.currentHostIndex = NO_CONNECTION_INDEX; // Act as if this is the first connection but let it sync with the previous one.
                pickNewConnection();
                this.isClosed = false;
                this.closedReason = null;
            } else {
                String reason = "No operations allowed after connection closed.";
                if (this.closedReason != null) {
                    reason += ("  " + this.closedReason);
                }
                throw SQLError.createSQLException(reason, SQLError.SQL_STATE_CONNECTION_NOT_OPEN, null /* no access to a interceptor here... */);
            }
        }

        Object result = null;

        try {
            result = method.invoke(this.thisAsConnection, args);
            result = proxyIfReturnTypeIsJdbcInterface(method.getReturnType(), result);
        } catch (InvocationTargetException e) {
            dealWithInvocationException(e);
        }

        ....

        return result;
    }

这四个元素摘录如下：

this.isClose为真
!allowedOnClosedConnection(method)为真
this.autoReconnect为真
!this.closedExplicitly为真

对于isClose属性，我们观察它被设置为true的地方是在MultiHostConnectionProxy#invoke方法里。

public synchronized Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        String methodName = method.getName();

        ....

        if (METHOD_CLOSE.equals(methodName)) {
            doClose();
            this.isClosed = true;
            this.closedReason = "Connection explicitly closed.";
            this.closedExplicitly = true;
            return null;
        }

        if (METHOD_ABORT_INTERNAL.equals(methodName)) {
            doAbortInternal();
            this.currentConnection.abortInternal();
            this.isClosed = true;
            this.closedReason = "Connection explicitly closed.";
            return null;
        }

        if (METHOD_ABORT.equals(methodName) && args.length == 1) {
            doAbort((Executor) args[0]);
            this.isClosed = true;
            this.closedReason = "Connection explicitly closed.";
            return null;
        }

        ....
    }

调用方调用指定的方法时，isClosed属性就会被设置为true。对于被调用的方法整理如下：

常量	常量值	声明方法的接口
METHOD_CLOSE	close	java.sql.Connection
METHOD_ABORT_INTERNAL	abortInternal	com.mysql.jdbc.Connection
METHOD_ABORT	abort	java.sql.Connection

我们可以看到，这些方法被调用后，不仅isClose属性被设置为true，而且直接返回null。所以这里暗示出，在执行了这行方法之后，再调用其他的Connection接口方法时才进行前面四个元素的判定。比如：先执行conn.close()命令，然后再执行conn.createStatement()命令。

对于!allowedOnClosedConnection(method)为真的判断条件，其实是只要调用方不是调用连接以下方法就符合。现对于被调用的方法整理如下：

常量	常量值	声明方法的接口
METHOD_GET_AUTO_COMMIT	getAutoCommit	java.sql.Connection
METHOD_GET_CATALOG	getCatalog	java.sql.Connection
METHOD_GET_TRANSACTION	getTransactionIsolation	java.sql.Connection
METHOD_GET_SESSION_MAX_ROWS	getSessionMaxRows	com.mysql.jdbc.Connection

这些命令返回的数据已经在当前连接建立的时候，从Mysql获取到了，所以不需要实时地连接到Mysql获取。

对于autoReconnect，默认值为false。用户可以在url上添加autoReconnect=true或者autoReconnectForPools=true选项来更新。

对于!this.closedExplicitly，我们首先观察closedExplicitly设置为ture的地方。对于被调用的方法整理如下：

常量	常量值	声明方法的接口
METHOD_CLOSE	close	java.sql.Connection

所以，只要调用方调用的不是连接的close方法，即可满足。

小结

综上所述，只要调用方调用连接的abortInternal或者abort方法，并且在url上添加autoReconnect=true或者autoReconnectForPools=true选项，然后再调用比如createStatement方法就会执行pinkNewConnection方法，更换连接。也就是说，此处提供给调用方主动提出更换连接的机会。

五、专有选项

failover模式有几个专有的选项，可添加到url，下面将从代码的角度来介绍它们的作用：

retriesAllDown

如果当前的连接出现通讯异常会进入failover方法，它的作用是控制url里的ip:port组集合的被轮循的遍数。理论上整个ip:port组集合每个组合都试过一次算1次。但是第一组ip:port作为主服务器，是否尝试建立连接，还取决于secondsBeforeRetryMaster或者queriesBeforeRetryMaster这两个选项值。

secondsBeforeRetryMaster和queriesBeforeRetryMaster

secondsBeforeRetryMaster是表示第一组ip:port对应的主服务器连不上后，要等待的毫秒数。queriesBeforeRetryMaster表示表示第一组ip:port对应的主服务器连不上后，隔了多少次SQL操作后才再尝试连接主服务器。

synchronized boolean readyToFallBackToPrimaryHost() {
        return this.enableFallBackToPrimaryHost && connectedToSecondaryHost() && (secondsBeforeRetryPrimaryHostIsMet() || queriesBeforeRetryPrimaryHostIsMet());
    }

//判断距离主服务器断开连接的时间长度
private synchronized boolean secondsBeforeRetryPrimaryHostIsMet() {
        return this.secondsBeforeRetryPrimaryHost > 0 && Util.secondsSinceMillis(this.primaryHostFailTimeMillis) >= this.secondsBeforeRetryPrimaryHost;
    }

//判断距离主服务器断开连接后执行的操作次数是否足够
private synchronized boolean queriesBeforeRetryPrimaryHostIsMet() {
        return this.queriesBeforeRetryPrimaryHost > 0 && this.queriesIssuedSinceFailover >= this.queriesBeforeRetryPrimaryHost;
    }

在FailoverJdbcInterfaceProxy#invoke方法里对执行execute为前缀的方法进行queriesIssuedSinceFailover属性累加。根据上面代码块，它被用于与queriesBeforeRetryPrimaryHost进行比较：

@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
            String methodName = method.getName();

            boolean isExecute = methodName.startsWith("execute");

            if (FailoverConnectionProxy.this.connectedToSecondaryHost() && isExecute) {
                FailoverConnectionProxy.this.incrementQueriesIssuedSinceFailover();
            }

            Object result = super.invoke(proxy, method, args);

    ....
}


//FailoverConnectionProxy#incrementQueriesIssuedSinceFailover有个累加动作
synchronized void incrementQueriesIssuedSinceFailover() {
        this.queriesIssuedSinceFailover++;
    }

autoReconnect和autoReconnectForPools

当用户在url配置这两个选项后，在程序里会变成MultiHostConnectionProxy#autoReconnect属性值。该值在前文的“四元素判定”一节已经介绍了。

failOverReadOnly

官网的描述如下：

Sequence A, with failOverReadOnly=true:

Connects to primary host in read/write mode
Sets Connection.setReadOnly(true); primary host now in read-only mode
Failover event; connects to secondary host in read-only mode
Sets Connection.setReadOnly(false); secondary host remains in read-only mode
Falls back to primary host; connection now in read/write mode

Sequence B, with failOverReadOnly=false

Connects to primary host in read/write mode
Sets Connection.setReadOnly(true); primary host now in read-only mode
Failover event; connects to secondary host in read-only mode
Set Connection.setReadOnly(false); connection to secondary host switches to read/write mode
Falls back to primary host; connection now in read/write mode

核心意思就是url的第一对ip:port（主服务器的Mysql）默认具有读写能力，而其他服务器的Mysql只能是读操作。当在url添加failOverReadOnly=false选项，在主服务器连接异常而使用别的服务器的连接时，如果调用方执行 Connection.setReadOnly(false),后续就可以更新数据了。

代码上，在更换连接前，有一个计算readOnly的过程。该过程就是体现了官网对readOnly的描述。

private synchronized void switchCurrentConnectionTo(int hostIndex, MySQLConnection connection) throws SQLException {
        invalidateCurrentConnection();

        boolean readOnly;
        if (isPrimaryHostIndex(hostIndex)) {
            readOnly = this.explicitlyReadOnly == null ? false : this.explicitlyReadOnly;
        } else if (this.failoverReadOnly) {
            readOnly = true;
        } else if (this.explicitlyReadOnly != null) {
            readOnly = this.explicitlyReadOnly;
        } else if (this.currentConnection != null) {
            readOnly = this.currentConnection.isReadOnly();
        } else {
            readOnly = false;
        }
        syncSessionState(this.currentConnection, connection, readOnly);
        this.currentConnection = connection;
        this.currentHostIndex = hostIndex;
    }

当从主服务器切换到从服务器时，计算的元素为explicitlyReadOnly属性值。它的类型是Boolean，该属性需要显示赋值。该值可通过动态代理连接的setReadOnly方法设置。切换到从服务器时，以之前通过setReadOnly设置的值为准。

从服务器切换到其他服务器是主要的观察场景。failoverReadOnly的类型是boolean，即默认为false值。如果使用默认值，就以动态代理连接的setReadOnly方法的参数为准。

从实用性考虑，我们当然希望主服务器具有的读、写能力在切换到任何一台从服务器上也具有同样的能力。为了保证下一次切换连接时具有当前连接的读、写能力，调用者需要在感知切换了新的底层连接时就要调用setReadOnly(false)，更直接的处理就是在捕获SQLException后判断异常类型，如果为通讯类异常就调用setReadOnly(false)方法。但这样无形中给开发人员添加精神压力和增加工作量。

六、官网的态度

对于自动更换连接，官方的态度是谨慎的：

Seamless Reconnection

Although not recommended, you can make the driver perform failovers without invalidating the active Statement or ResultSet instances by setting either the parameter autoReconnect or autoReconnectForPools to true. This allows the client to continue using the same object instances after a failover event, without taking any exceptional measures. This, however, may lead to unexpected results: for example, if the driver is connected to the primary host with read/write access mode and it fails-over to a secondary host in real-only mode, further attempts to issue data-changing queries will result in errors, and the client will not be aware of that. This limitation is particularly relevant when using data streaming: after the failover, the ResultSet looks to be alright, but the underlying connection may have changed already, and no backing cursor is available anymore.

在Failover模块下构造的Statement实例，它的connection属性值为JDBC4MultiHostMySQLConnection实例。当Statement实例需要通过connection做点什么事的时候，JDBC4MultiHostMySQLConnection实例就会找到FailoverConnectionProxy实例的currentConnection属性值。当发生failover的时候，该属性值会指向另一个底层连接，而该底层连接有可能是ReadOnly模式，因此官方会有上述的表态。

为避免这种情况的发生，就要确保每个底层的连接都是读、写模式。而为做到这一点，需要做到以下两点：

url里添加failoverReadOnly=false选项或者根本就不添加
调用者需要在捕获SQLException后判断异常类型，如果为通讯类异常就调用setReadOnly(false)方法

总结

本文首先分析了Failover模块的结构，并通过两个“区分”来分析该模式的异常处理。当发生了异常后，调用者都能够感知得到。在动态代理连接使用过程中属性的通讯异常，它会尝试更换底层连接，然后再说将异常往上抛。所以调用者捕获异常后，下次再使用动态代理连接时，底层的连接可能已经是另外一个连接了。基于调用者能够感知到异常的发生，该模式具有一定的使用价值。

随后我们也了解到发生failover后读、写模式可能会发生变化，并且试图提供解决方案。为确保每个底层连接都具有读和写的能力，开发人员必须有专门的代理处理。因此，如果项目从使用单台Myql数据库升级为使用多台Mysql数据库，开发人员需要修改代码。那么，是否存在一个模式，为开发人员带来更方便的升级体验呢？请关注另一篇文章《Mysql Connector/J 源码分析（LoadBalance）》。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Java阵营