Fast Connection Failover
March 10th, 2013This topic has consistently come up whenever a database or Real Application Cluster (RAC) node fails. The question is always, “why don’t the applications gracefully failover?” It’s a fair question, and the answer is really that it’s harder than it looks. Oracle has offered Transparent Application Failover (TAF) for a very long time, but TAF only works for OCI based applications like sqlplus and JDBC thick clients. Since the majority of applications seem to use JDBC thin connections TAF isn’t really an option. With JDBC thin connections there is the Fast Connection Failover (FCF) feature. It works in conjunction with a connection cache to provide the ability to recover from a RAC node failure without having to restart the application. And of course with all things Oracle, the connection cache feature has now changed to Universal Connection Pooling (UCP). The other pieces of FCF include Fast Application Notification (FAN) events and Oracle Notification Service (ONS). ONS is used to transport the FAN event notifications.
There is a lot of documentation from Oracle and on the web about how to set up FCF and UCP. There isn’t much documentation though on how to implement it once you have it set up. That’s really what I wanted to talk about, but you do have to understand how to set it up to use it. I have included most of the resources I used to try to understand FCF and UCP at the end of this post. I have also added a working sample to the appsdba.com web site so you can try this out yourself.
To set up the FCF example you will need Java 1.6 and three jar files: ucp.jar, ons.jar, ojdbc6.jar. The jar files are available on otn.oracle.com (OTN) and I used the 11.2.0.3 versions. You can probably use an older version of Java if you can find compatible versions of the jar files, but you’re on your own for that. I’ve only tested with the 11.2.0.3 versions on OTN.
The example from the appsdba.com web site is called FcfExample.java and if you open it up with a text editor there are setup instructions. The instructions are the following:
* To run the test: * * Set JAVA_HOME to your Java 1.6 directory * (set JAVA_HOME="C:\Program Files (x86)\Java\jdk1.6.0_37\bin") * Add JAVA_HOME to the path * (set PATH="%JAVA_HOME%;%PATH%") * Add the source directory to the LIBS variable * (set LIBS="c:\temp\jdbc_fcf_test") * Add jar files to the CLASSPATH (ons.jar, ucp.jar, ojdbc6.jar) * (set CLASSPATH="c:\temp\jdbc_fcf_test;c:\temp\jdbc_fcf_test\ons.jar;c:\temp\jdbc_fcf_test\ojdbc6.jar;c:\temp\jdbc_fcf_test\ * ucp.jar" * * Compile with: javac FcfExample.java * Run with: java FcfExample * * The idea is to start the script running, then kill the instance or service * the script is running on (that's why the instance name is listed) and * the code is written so that the query will restart once a new connection * is re-established. * * An additional file is created called FcfExample.log which lists the FAN * events coming from ONS during the run. You should see the connection fail * and then get re-established.
A couple of comments. I created this example using three main sources (i.e. see references [10], [11], [15]), and the example also writes the FAN messages to a separate log file.
Now the interesting part of the example is the actual code that demonstrates how to recover from a failover.
do {
try {
conn = ds.getConnection();
conn.setAutoCommit(false);
System.out.println("Obtained a new connection from pool ...");
System.out.println();
// Print connection stats
JDBCConnectionPoolStatistics stats = ds.getStatistics();
if (stats != null) {
printStat("AbandonedConnectionsCount", stats);
printStat("AvailableConnectionsCount", stats);
printStat("AverageBorrowedConnectionsCount", stats);
printStat("AverageConnectionWaitTime", stats);
printStat("BorrowedConnectionsCount", stats);
printStat("ConnectionsClosedCount", stats);
printStat("ConnectionsCreatedCount", stats);
printStat("RemainingPoolCapacityCount", stats);
printStat("TotalConnectionsCount", stats);
System.out.println();
System.out.println("***");
System.out.println();
}
// Start work here, it has been made restartable
if (retry) {
if (step == "getBanner") {
retry = false;
getBanner(conn);
}
} else {
step = "getBanner";
getBanner(conn);
}
if (retry) {
if (step == "getInstance") {
retry = false;
getInstance(conn);
}
} else {
step = "getInstance";
getInstance(conn);
}
if (retry) {
if (step == "runQuery") {
retry = false;
runQuery(conn);
}
} else {
step = "runQuery";
runQuery(conn);
}
if (retry) {
if (step == "getInstance") {
retry = false;
getInstance(conn);
}
} else {
step = "getInstance";
getInstance(conn);
}
}
catch (SQLException sqlexc) {
System.out.println();
System.out.println("+-------------------------------+");
System.out.println("| SQL Exception in program |");
System.out.println("+-------------------------------+");
System.out.println("SQL Error Code : " + sqlexc.getErrorCode());
System.out.println("SQL Error Message : " + sqlexc.getMessage());
System.out.println();
// The recommended way to check connection usability after a
// RAC-down event triggers UCP FCF actions.
if (conn == null || !((ValidConnection) conn).isValid()) {
System.out.println("Connection retry necessary ...");
try {
conn.close();
}
catch (Exception closeExc) {
System.out.println("Exception detected when closing connection:");
closeExc.printStackTrace();
}
retry = true;
}
}
} while (retry);
Granted this is a simplistic example, but the point is that the application has to handle a failover. This example uses a loop and an exception handler to catch any failures. A retry boolean is used to check to see if there’s been a failure and a step variable records the current step so the code knows where to restart. The proper restart is dependent on the ordering of the steps but this is just a sequential single thread so that’s not a problem.
When this example is run we see a printout of the connection pool stats, then the database banner and instance information, these run quick, and then a longer running query that counts the number of objects in the database up to a limit. It is at this point that you can kill the RAC node the program is running on, that’s why the instance information is listed, and see how the program fails over and re-runs the query.
The following is from a normal run with no failures:
C:\Temp\FCF>java FcfExample Obtained a new connection from pool ... AbandonedConnectionsCount 0 AvailableConnectionsCount 9 AverageBorrowedConnectionsCount 0 AverageConnectionWaitTime 0 BorrowedConnectionsCount 1 ConnectionsClosedCount 0 ConnectionsCreatedCount 10 RemainingPoolCapacityCount 15 TotalConnectionsCount 10 *** Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production Instance number: 2 Instance name: ORCL2 Host: racvm2 Object count: 1000000000 Instance number: 2 Instance name: ORCL2 Host: racvm2 Released connection ... *** AbandonedConnectionsCount 0 AvailableConnectionsCount 10 AverageBorrowedConnectionsCount 0 AverageConnectionWaitTime 0 BorrowedConnectionsCount 0 ConnectionsClosedCount 0 ConnectionsCreatedCount 10 RemainingPoolCapacityCount 15 TotalConnectionsCount 10 C:\Temp\FCF>
We see the connection pool usage and no messages about failures in the run. The FAN event log shows the following:
Feb 5, 2013 7:46:30 PM oracle.ucp.admin.UniversalConnectionPoolManagerMBeanImpl getUniversalConnectionPoolManagerMBean FINE: Universal Connection Pool Manager MBean created Feb 5, 2013 7:46:30 PM oracle.ucp.jdbc.PoolDataSourceImpl createPoolWithDefaultProperties FINE: Connection pool instance is created with default properties Feb 5, 2013 7:46:30 PM oracle.ucp.jdbc.PoolDataSourceImpl createUniversalConnectionPoolMBean FINE: JDBC PoolMBean created Feb 5, 2013 7:46:30 PM oracle.ucp.jdbc.PoolDataSourceImpl startPool FINE: Connection pool instance is created Feb 5, 2013 7:46:30 PM oracle.ucp.jdbc.oracle.RACManagerImpl start FINE: failover started Feb 5, 2013 7:46:31 PM oracle.ucp.common.UniversalConnectionPoolBase start FINE: universal connection pool started Feb 5, 2013 7:46:31 PM oracle.ucp.jdbc.PoolDataSourceImpl startPool FINE: connection pool is started
Now let’s look at what happens when we stop the node that we’re running from in the middle of the longer running query:
C:\Temp\FCF>java FcfExample Obtained a new connection from pool ... AbandonedConnectionsCount 0 AvailableConnectionsCount 9 AverageBorrowedConnectionsCount 0 AverageConnectionWaitTime 0 BorrowedConnectionsCount 1 ConnectionsClosedCount 0 ConnectionsCreatedCount 10 RemainingPoolCapacityCount 15 TotalConnectionsCount 10 *** Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production Instance number: 1 Instance name: ORCL1 Host: racvm1 +-------------------------------+ | SQL Exception in program | +-------------------------------+ SQL Error Code : 1089 SQL Error Message : ORA-01089: immediate shutdown in progress - no operations are permitted Connection retry necessary ... Obtained a new connection from pool ... AbandonedConnectionsCount 0 AvailableConnectionsCount 8 AverageBorrowedConnectionsCount 0 AverageConnectionWaitTime 0 BorrowedConnectionsCount 1 ConnectionsClosedCount 1 ConnectionsCreatedCount 10 RemainingPoolCapacityCount 16 TotalConnectionsCount 9 *** Object count: 1000000000 Instance number: 3 Instance name: ORCL3 Host: racvm3 Released connection ... *** AbandonedConnectionsCount 0 AvailableConnectionsCount 6 AverageBorrowedConnectionsCount 0 AverageConnectionWaitTime 0 BorrowedConnectionsCount 0 ConnectionsClosedCount 4 ConnectionsCreatedCount 10 RemainingPoolCapacityCount 19 TotalConnectionsCount 6 C:\Temp\FCF>
The RAC node the script was connected to was stopped with the following command:
112:oracle@racvm1:/home/oracle [ORCL1] $ srvctl stop instance -d ORCL -i ORCL1 113:oracle@racvm1:/home/oracle [ORCL1] $
And the FAN event log shows that four connections in the UCP were closed and that the ORCL1 instance was shut down:
C:\Temp\FCF>type FcfExample.log Feb 5, 2013 7:28:08 PM oracle.ucp.admin.UniversalConnectionPoolManagerMBeanImpl getUniversalConnectionPoolManagerMBean FINE: Universal Connection Pool Manager MBean created Feb 5, 2013 7:28:08 PM oracle.ucp.jdbc.PoolDataSourceImpl createPoolWithDefaultProperties FINE: Connection pool instance is created with default properties Feb 5, 2013 7:28:08 PM oracle.ucp.jdbc.PoolDataSourceImpl createUniversalConnectionPoolMBean FINE: JDBC PoolMBean created Feb 5, 2013 7:28:08 PM oracle.ucp.jdbc.PoolDataSourceImpl startPool FINE: Connection pool instance is created Feb 5, 2013 7:28:08 PM oracle.ucp.jdbc.oracle.RACManagerImpl start FINE: failover started Feb 5, 2013 7:28:10 PM oracle.ucp.common.UniversalConnectionPoolBase start FINE: universal connection pool started Feb 5, 2013 7:28:10 PM oracle.ucp.jdbc.PoolDataSourceImpl startPool FINE: connection pool is started Feb 5, 2013 7:28:22 PM oracle.ucp.common.UniversalConnectionPoolBase closePhysicalConnection FINE: closed physical connection Feb 5, 2013 7:28:33 PM oracle.ucp.common.UniversalConnectionPoolBase closePhysicalConnection FINE: closed physical connection Feb 5, 2013 7:28:33 PM oracle.ucp.common.UniversalConnectionPoolBase closePhysicalConnection FINE: closed physical connection Feb 5, 2013 7:28:33 PM oracle.ucp.common.UniversalConnectionPoolBase closePhysicalConnection FINE: closed physical connection Feb 5, 2013 7:28:33 PM oracle.ucp.jdbc.oracle.ONSDatabaseEventHandlerTask run FINE: event triggered: Service name: orcl, Instance name: orcl1, Unique name: orcl, Host name: racvm1, Status: down, Cardinality: 0, Reason: user, Event type: database/event/service
Note that in the failure example that the code restarted at the object count step, we don’t see any information about what node we’re connected to until the last two steps run after the “runQuery” step.
References:
[1] Oracle WebLogic Server and Highly Available Oracle Databases: Oracle Integrated Maximum Availability Solutions – Active GridLink
[2] High-Performance Oracle JDBC Programming
Learn how to improve performance of Oracle-driven JDBC programs by leveraging connection and statement pooling features.
By Yuli Vasiliev
Published April 2009
[3] How to Use the Oracle Universal Connection Pool (UCP) ? [ID 759208.1]
[4] Transitioning to Oracle Universal Connection Pool (UCP)
[5] Fast Connection Failover (FCF) Test Client Using 11g JDBC Driver and 11g RAC Cluster [ID 566573.1]
[6] How to Use the Oracle Universal Connection Pool (UCP) ? [ID 759208.1]
[7] Understanding Transparent Application Failover (TAF) and Fast Connection Failover (FCF) [ID 334471.1]
[8] How to Verify Universal Connection Pool (UCP) / Fast Connection Failover (FCF) Setup [ID 1064652.1]
[9] JDBC Fast Connection Failover with Oracle RAC, Configuring and testing with Tomcat and the Spring framework, February 08, 2010
[10] URL:http://www.drdobbs.com/jvm/jdbc-fast-connection-failover-with-oracl/222700353
Author: Michael Pilone
[12] Oracle® Database Administrator’s Guide, 11g Release 2 (11.2)
[13] Oracle® Universal Connection Pool for JDBC Developer’s Guide, 11g Release 2 (11.2)
[14] Oracle® Database JDBC Developer’s Guide, 11g Release 2 (11.2)
[15] Oracle Database 11g UCP Downloads, http://www.oracle.com/technetwork/database/enterprise-edition/downloads/ucp-112010-099129.html – ucpdemos.jar (FCFSample.java)

Entries