One important duty of mine during years' of technical support is to analyze JVM thread dump, which is an interesting research to help understand how the logic in source code works and more important is that you will learn how to design a good Java EE applications to avoid scenario of high CPU usage, resource contention or something like these. One typical bad design it dead lock, though it seldom happens in modern Java development, but I will demonstrate all of them with sample source code, and show how to analyze the root cause from Thread Dump. All will be practical with no abstract theory.
This article will discuss one scenario you will always see, High CPU Usage like 99% or 100%. This case is quite easy to identify the root cause.
Let's read this sample source code first, it is just an example and nobody developer would code such.
package zigzag.research.threaddump; public class HighCPU { public static void main(String[] args) { while(true){ // do nothing } } }
On Linux
Run it and top command shows you the problematic parent process (attention it is process in Linux, not thread).
Now we look at the JVM thread dump (I removed some unusual information section that I do not care). What problematic thread will you get from below? WAITING status? No! In Java VM, there are many daemon threads and in most of time they are WAITING.
"Finalizer" daemon prio=1 tid=0x08fb6428 nid=0xbea in Object.wait() [0xb59d5000..0xb59d60b0] at java.lang.Object.wait(Native Method) - waiting on(a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked(a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)"Reference Handler" daemon prio=1 tid=0x08fb5ea8 nid=0xbe9 in Object.wait() [0xb5a56000..0xb5a56e30] at java.lang.Object.wait(Native Method) - waiting on(a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:474) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked(a java.lang.ref.Reference$Lock)"main" prio=1 tid=0x08f128d0 nid=0xbe5 runnable [0xff80c000..0xff80c5e8] at zigzag.research.threaddump.HighCPU.main(HighCPU.java:6)
On Linux, we can press Shift-H on top command and the individual Threads will be displayed instead of parent PID. The man page for top command is:
-H : Threads toggle Starts top with the last remembered 'H' state reversed. When this toggle is On, all individual threads will be displayed. Otherwise, top displays a summation of all threads in a process.
Now we get above top command screenshot with Shift-H is toggled on. The problematic Thread (PID) occurs as 3045 which is Decimal and the Hexadecimal value is 0xBE5. So you got it?
"main" prio=1 tid=0x08f128d0 nid=0xbe5 runnable [0xff80c000..0xff80c5e8] at zigzag.research.threaddump.HighCPU.main(HighCPU.java:6)
On Windows
It was difficult to get the OS level Thread on Windows platform to anlyze thread dump. Luckily we have 3rd part tool like Process Explorer to get the problematic TID like below Screenshot.
You will see the CPU usage increase to almost 100% if your computer CPU is single core. In my example it consumes 24.71% because mine is 4 core CPU.
As above screenshot shows a problematic TID which takes 24.71% CPU time and never goes down, which is abnormal. Here the TID is native thread id in operation system level and it is not the "tid" of JVM thread dump, they are definitely different. The TID in OS is the nid (Native Thread ID) in JVM Thread Dump stack.
In this example the TID (Native Thread ID) is 2708 (Decimal), which is 0xA94(hexadecimal). So you will get the real problematic thread stack if you have:
"main" prio=6 tid=0x000000000054b000 nid=0xa94 runnable [0x00000000025cf000] java.lang.Thread.State: RUNNABLE at zigzag.research.threaddump.HighCPU.main(HighCPU.java:6)
Summary
Java nid(Native Thread ID) = Linux Process = Windows Thread