Introduction
JRockit has a hard limit on the number of fat locks that can be "live" at once. While this limit is very large, the use of ever larger heap sizes makes hitting this limit more likely. In this post, I want to explain what exactly this limit is and how you can work around it if you need to.
Background
Java locks (AKA monitors) in JRockit basically come in one of two varieties, thin and fat. (We'll leave recursive and lazy locking out of the conversation for now.) For a detailed explanation of how we implement locking in JRockit, I highly recommend reading chapter 4 of JR:TDG. But for now, all that you need to understand is the basic difference between thin and fat locks. Thin locks are lightweight locks with very little overhead, but any thread trying to acquire a thin lock must spin until the lock is available. Fat locks are heavyweight and have more overhead, but threads waiting for them can queue up and sleep while waiting, saving CPU cycles. As long as there is only very low contention for a lock, thin locks are preferred. But if there is high contention, then a fat lock is ideal. So normally a lock will begin its life as a thin lock, and only be converted to a fat lock once the JVM decides that there is enough contention to justify using a fat lock. This conversion of locks between thin and fat is known as inflation and deflation.
Limitation
One of the reasons we call fat locks "heavyweight" is that we need to maintain much more data for each individual lock. For example, we need to keep track of any threads that have called wait() on it (the wait queue) and also any threads that are waiting to acquire the lock (the lock queue). For quick access to this lock information, we store this information in an array (giving us a constant lookup time). We'll call this the monitor array. Each object that corresponds to a fat lock holds an index into this array. We store this index value in a part of the object header known as the lock word. The lock word is a 32-bit value that contains several flags related to locking (and the garbage collection system) in addition to the monitor array index value (in the case of a fat lock). After the 10 flag bits, there are 22 bits left for our index value, limiting the maximum size of our monitor array to 2 22, or space to keep track of just over 4 million fat locks.
Now for a fat lock to be considered "live", meaning it requires an entry in the monitor array, it's object must still be on the heap. If the object is garbage collected or the lock is deflated, it's slot in the array will be cleared and made available to hold information about a different lock. Note that because we depend on GC to clean up the monitor array, even if the object itself is no longer part of the live set (meaning it is eligible for collection), the lock information will still be considered "live" and can not be recycled until the object gets collected.
So what happens when we use up all of the available slots in the monitor array? Unfortunately, we abort and the JVM exits with an error message like this:
===
ERROR] JRockit Fatal Error: The number of active Object monitors has overflowed. (87)
[ERROR] The number of used monitors is 4194304, and the maximum possible monitor index 4194303
===
Want to see for yourself? Try the test case below. One way to guarantee that a lock gets inflated by JRockit is to call wait() on it. So we'll just keep calling wait() on new objects until we run out of slots.
=== LockLeak.java
import java.util.Collections;
import java.util.LinkedList;
import java.util.List;
public class LockLeak extends Thread {
static List<Object> list = new LinkedList<Object>();
public static void main(String[] arg) {
boolean threadStarted = false;
for (int i = 0; i < 5000000; i++) {
Object obj = new Object();
synchronized(obj) {
list.add(0, obj);
if (!threadStarted) {
(new LockLeak()).start();
threadStarted = true;
}
try {
obj.wait();
} catch (InterruptedException ie) {} // eat Exception
}
}
System.out.println("done!"); // you must not be on JRockit!
System.exit(0);
}
public void run() {
while (true) {
Object obj = list.get(0);
synchronized(obj) {
obj.notify();
}
}
}
}
===
(Yes, this code is not even remotely thread safe. Please don't write code like this in real life and blame whatever horrible fate that befalls you on me. Think of this code as for entertainment purposes only. You have been warned.)
Resolution
While this may seem like a very serious limitation, in practice it is very unlikely to see even the most demanding application hit this limit. The good news is, even if you do have a system that runs up against this limit, you should be able to tune around the issue without too much difficulty. The key point is that GC is required to clean up the monitor array. The more frequently you collect your heap, the quicker "stale" monitor information (lock information for an object that is no longer part of the live set) will be removed.
As an example, one of our fellow product teams here at Oracle recently hit this limit while using a 50GB heap with a single space collector. By enabling the nursery (switching to a generational collector), they were able to completely avoid the issue. By proactively collecting short-lived objects, they avoided filling up the monitor array with entries for dead objects (that would otherwise have to wait for a full GC to be removed).
One other possible solution may be to set the -XX:FatLockDeflationThreshold option to a value below the default of 50 to more aggressively deflate fat locks. While this does work well for simple test cases like LockLeak.java above, I believe that more aggressive garbage collection is more likely to resolve any issues without a negative performance impact.
Either way, we have never seen anyone hit this problem that was not able to tune around the limitation very easily. It is hard to imagine that any real system will ever need more than 4 million fat locks all at once. But in all seriousness, given JRockit's current focus on stability and the lack of a use case that requires more, we are almost certainly not going to ever make the significant (read: risky) changes that removing or expanding this limit would require. The good news is that HotSpot does not seem to have a similar limitation.
Conclusion
You are very unlikely to ever see this issue unless you are running an application with a very large heap, a lot of lock contention, and very infrequent collections. By tuning to collect dead objects that correspond to fat locks faster, for example by enabling a young collector, you should be able to avoid this limit easily. In practice, no application today (or for the near future) will really need over 4 million fat locks at once. As long as you help the JVM prune the monitor array frequently enough, you should never even notice this limit.
JRockit has a hard limit on the number of fat locks that can be "live" at once. While this limit is very large, the use of ever larger heap sizes makes hitting this limit more likely. In this post, I want to explain what exactly this limit is and how you can work around it if you need to.
Background
Java locks (AKA monitors) in JRockit basically come in one of two varieties, thin and fat. (We'll leave recursive and lazy locking out of the conversation for now.) For a detailed explanation of how we implement locking in JRockit, I highly recommend reading chapter 4 of JR:TDG. But for now, all that you need to understand is the basic difference between thin and fat locks. Thin locks are lightweight locks with very little overhead, but any thread trying to acquire a thin lock must spin until the lock is available. Fat locks are heavyweight and have more overhead, but threads waiting for them can queue up and sleep while waiting, saving CPU cycles. As long as there is only very low contention for a lock, thin locks are preferred. But if there is high contention, then a fat lock is ideal. So normally a lock will begin its life as a thin lock, and only be converted to a fat lock once the JVM decides that there is enough contention to justify using a fat lock. This conversion of locks between thin and fat is known as inflation and deflation.
Limitation
One of the reasons we call fat locks "heavyweight" is that we need to maintain much more data for each individual lock. For example, we need to keep track of any threads that have called wait() on it (the wait queue) and also any threads that are waiting to acquire the lock (the lock queue). For quick access to this lock information, we store this information in an array (giving us a constant lookup time). We'll call this the monitor array. Each object that corresponds to a fat lock holds an index into this array. We store this index value in a part of the object header known as the lock word. The lock word is a 32-bit value that contains several flags related to locking (and the garbage collection system) in addition to the monitor array index value (in the case of a fat lock). After the 10 flag bits, there are 22 bits left for our index value, limiting the maximum size of our monitor array to 2 22, or space to keep track of just over 4 million fat locks.
Now for a fat lock to be considered "live", meaning it requires an entry in the monitor array, it's object must still be on the heap. If the object is garbage collected or the lock is deflated, it's slot in the array will be cleared and made available to hold information about a different lock. Note that because we depend on GC to clean up the monitor array, even if the object itself is no longer part of the live set (meaning it is eligible for collection), the lock information will still be considered "live" and can not be recycled until the object gets collected.
So what happens when we use up all of the available slots in the monitor array? Unfortunately, we abort and the JVM exits with an error message like this:
===
ERROR] JRockit Fatal Error: The number of active Object monitors has overflowed. (87)
[ERROR] The number of used monitors is 4194304, and the maximum possible monitor index 4194303
===
Want to see for yourself? Try the test case below. One way to guarantee that a lock gets inflated by JRockit is to call wait() on it. So we'll just keep calling wait() on new objects until we run out of slots.
=== LockLeak.java
import java.util.Collections;
import java.util.LinkedList;
import java.util.List;
public class LockLeak extends Thread {
static List<Object> list = new LinkedList<Object>();
public static void main(String[] arg) {
boolean threadStarted = false;
for (int i = 0; i < 5000000; i++) {
Object obj = new Object();
synchronized(obj) {
list.add(0, obj);
if (!threadStarted) {
(new LockLeak()).start();
threadStarted = true;
}
try {
obj.wait();
} catch (InterruptedException ie) {} // eat Exception
}
}
System.out.println("done!"); // you must not be on JRockit!
System.exit(0);
}
public void run() {
while (true) {
Object obj = list.get(0);
synchronized(obj) {
obj.notify();
}
}
}
}
===
(Yes, this code is not even remotely thread safe. Please don't write code like this in real life and blame whatever horrible fate that befalls you on me. Think of this code as for entertainment purposes only. You have been warned.)
Resolution
While this may seem like a very serious limitation, in practice it is very unlikely to see even the most demanding application hit this limit. The good news is, even if you do have a system that runs up against this limit, you should be able to tune around the issue without too much difficulty. The key point is that GC is required to clean up the monitor array. The more frequently you collect your heap, the quicker "stale" monitor information (lock information for an object that is no longer part of the live set) will be removed.
As an example, one of our fellow product teams here at Oracle recently hit this limit while using a 50GB heap with a single space collector. By enabling the nursery (switching to a generational collector), they were able to completely avoid the issue. By proactively collecting short-lived objects, they avoided filling up the monitor array with entries for dead objects (that would otherwise have to wait for a full GC to be removed).
One other possible solution may be to set the -XX:FatLockDeflationThreshold option to a value below the default of 50 to more aggressively deflate fat locks. While this does work well for simple test cases like LockLeak.java above, I believe that more aggressive garbage collection is more likely to resolve any issues without a negative performance impact.
Either way, we have never seen anyone hit this problem that was not able to tune around the limitation very easily. It is hard to imagine that any real system will ever need more than 4 million fat locks all at once. But in all seriousness, given JRockit's current focus on stability and the lack of a use case that requires more, we are almost certainly not going to ever make the significant (read: risky) changes that removing or expanding this limit would require. The good news is that HotSpot does not seem to have a similar limitation.
Conclusion
You are very unlikely to ever see this issue unless you are running an application with a very large heap, a lot of lock contention, and very infrequent collections. By tuning to collect dead objects that correspond to fat locks faster, for example by enabling a young collector, you should be able to avoid this limit easily. In practice, no application today (or for the near future) will really need over 4 million fat locks at once. As long as you help the JVM prune the monitor array frequently enough, you should never even notice this limit.