As a follow-up to an earlier blog, someone asked "Can you please tell us more about gravitationShrinkFrequencySeconds and DRCP ?" DRCP support isn’t shipping yet (wait until the next version of the Database and of WLS) so I can’t talk about it.
I think that people understand how connections are allocated in generic data sources and multi data sources (MDS). Let me talk generally about how connections are allocated in Active GridLink (AGL). With the former (generic and MDS), connection allocation is pretty much dependent on WLS. With AGL, it is also very dependent on the database configuration and runtime statistics. The runtime information is provided to WLS via Oracle Notification Service (ONS) in the form of up and down events and Runtime Load Balancing events.
Connections are added to the pool initially based on the configured initial capacity. Connect time load balancing based on the listener(s) is used to spread the connections across the instances in the RAC cluster. For that to work correctly, you must either specify a Single Client Access Name (SCAN) address or use LOAD_BALANCE=ON for multiple non-SCAN addresses. If you have multiple addresses and you forget to use LOAD_BALANCE=on, then all of the connections will end up on the first RAC instance – not what you want. Connection load balancing is intended to give out connections based on load on the available RAC instances. It’s not perfect. If you have two instances and they are evenly loaded, there should be approximately 50% of the connections on each instance (but don’t enter a bug if they split up 49/51). There a general problem with load balancing that the statistics can lag reality. If you have big swings in demand on the two instances, then you can end up creating a connection on the more heavily loaded instance.
If you go to get a connection from the pool, runtime load balancing is used to pick the instance. That’s again based on the load on the various instances and the information is updated by default every 30 seconds. If a connection is available on the desired instance, then you have a hit on the pool and you get the available connection.
If you go to get a connection from the pool and one doesn’t exist on the desired instance, then you have a miss. In this case, a connection is added to the pool on demand based on connection load balancing. That is, if you go to reserve a connection and there isn’t one available in the pool, then a new one is created and it is done using connection load balancing as described above. There’s an assumption here that the connect time and runtime load balancing match up.
If you are running within an XA transaction, then that takes priority in determining the connection that you get. There are restrictions on XA support within a RAC cluster such that all processing for an XA transaction branch must take place on one instance in the cluster (the restrictions are complicated so the one-instance limitation is the only safe approach). Additionally, performance is significantly better using a single instance so you want that to happen anyway. The first time that you get a connection, it’s via the mechanisms described above (runtime load balancing or connection load balancing). After that, any additional requests for connections within the same XA transaction will get a connection from the same instance. If you can’t get a connection with “XA affinity,” the application will get an exception.
Similar to XA affinity, if you are running within a Web session, you will get affinity for multiple connections created by the same session. Unlike XA affinity, “Session Affinity” is not mandatory. If you can’t get a connection to the same instance, it will go to another instance (but there will be a performance penalty).
When you take down a RAC instance, it generates a “planned down event”. Any unused connections for that instance are released immediately and connections in use are released when returned to the pool. If a RAC instance fails, it generates an “unplanned down event.” In that case, all connections for that instance are destroyed immediately.
When a RAC instance becomes available, either a new instance or an instance is restarted, it generates an “up event.” When that occurs, connections are proactively created on the new instance.
The pool can get filled up with connections to the “wrong” instance (heavily loaded) if load changes over time. To help that situation, we release unused connections based on runtime load balancing information. When gravitation shrinking occurs, one unused connection is destroyed on a heavily loaded instance. The default period for gravitation shrinking is 30 seconds but you can control it by setting the system property “weblogic.jdbc.gravitationShrinkFrequencySeconds” to an integer value. We don’t actively create a connection at this point. We wait until there is demand for a new connection and then the connection load balancing will kick in.
Finally, normal shrinking happens if not disabled. When this occurs, half of the unused connections down to minimum capacity are destroyed. The algorithm currently just takes the first set of connections on the list without regard to load balancing (it’s random with respect to instances). The default period is 900 seconds and you can configure this using ShrinkFrequencySeconds.
There are possible improvements that could be made with respect to pool misses, and gravitational and normal shrinking. And the database team is working on improving the load balancing done on their end. Still, it works pretty well with instances dynamically coming and going and the load changing over time.