Home >Technology peripherals >It Industry >What Does the Java Virtual Machine Do All Day?
This article was originally published by Ampere Computing
I saw a recent blog post about gprofng, a new GNU performance analysis tool. This blog uses a matrix vector multiplication program written in C language as an example. I'm a Java™ programmer and using tools designed for statically compiled C programming is often difficult to analyze Java applications because Java programs are compiled at runtime. In this blog post, I will show that gprofng is easy to use and is very useful for delving into the dynamic behavior of Java applications.
The first step is to write a matrix multiplication program. I wrote a complete matrix multiplication matrix program because it is not harder than matrix multiplication vectors. There are three main methods: one method calculates the innermost multiplication and addition operation, one method combines the multiplication and addition operation into a single element of the result, and one method iterates over each element of the result.
I wrap the calculations in a simple test program to repeatedly calculate the matrix product to ensure that the time is repeatable. (See Note 1.) The program prints out the time when each matrix multiplication starts (relative to the time when the Java virtual machine is started), and the time it takes for each matrix multiplication. Here I ran the test program to multiply two 8000×8000 matrices. The test program repeated the calculation 11 times, and to better highlight the subsequent behavior, sleep for 920 milliseconds between repetitions:
<code class="language-bash">$ numactl --cpunodebind=0 --membind=0 -- \ java -XX:+UseParallelGC -Xms31g -Xmx31g -Xlog:gc -XX:-UsePerfData \ MxV -m 8000 -n 8000 -r 11 -s 920</code>
Figure 1: Running the matrix multiplication program
Please note that the second repetition takes 92% of the time of the first repetition, while the last repetition takes only 89% of the time of the first repetition. These changes in execution time confirm that Java programs need some time to warm up.
The question is: Can I use gprofng to see what happens between the first and last repetitions, resulting in performance improvements?
One way to answer this question is to run the program and have gprofng collect running information. Fortunately, this is easy: I just need to prefix the command line to collect what gprofng calls "experimental" information:
<code class="language-bash">$ numactl --cpunodebind=0 --membind=0 -- \ gprofng collect app \ java -XX:+UseParallelGC -Xms31g -Xmx31g -Xlog:gc --XX:-UsePerfData \ MxV -m 8000 -n 8000 -r 11 -s 920</code>
Figure 2: Running the matrix multiplication program under gprofng
The first thing to note is that, like any performance analysis tool, collecting performance analysis information can cost the application. Compared to previous unanalysed runs, gprofng does not appear to cause significant overhead.
I can then ask how time is spent in gprofng the entire application. (See Note 2.) For the entire run, the 24 most popular methods of gprofng are:
<code class="language-bash">$ gprofng display text test.1.er -viewmode expert -limit 24 -functions</code>
Figure 3: 24 ways to show the hottest gprofng
The function view shown above gives the exclusive and inclusion CPU time for each method, expressed in the percentage of seconds and total CPU time. The named function is a pseudo-function generated by gprofng, with the total value of various indicators. In this case, I see that the total CPU time of the entire application is 1.201 seconds.
Application methods (those from MxV class) are all in it, occupying most of the CPU time, but there are some other methods, including the JVM runtime compiler (Compilation::Compilation), and Other functions that are not part of the matrix multiplication program. This display of the whole program execution captures the allocation (MxV.allocate) and initialization (MxV.initialize) codes, which I'm not very interested in because they are part of the test program, are only used during startup, and are multiplied with matrix It hardly matters.
I can use gprofng to follow the parts of the application I am interested in. A great feature of gprofng is that after collecting experiments, I can apply the filter to the collected data. For example, look at what happens during a specific time interval, or what happens when a specific method is on the call stack. For demonstration purposes and to make filtering easier, I added a strategic call to Thread.sleep(ms) to make it easier to write filters based on program phases (divided by one second intervals). This is why the program output in Figure 1 above is about one second between each repetition, even if each matrix multiplication only takes about 0.1 seconds.
gprofng is scriptable, so I wrote a script to extract a single second from the gprofng experiment. The first second is all about Java virtual machine startup.
Figure 4: The hottest method for filtering the first second. Within this second, the matrix multiplication has been manually delayed so that I can show JVM startup
I can see that the compiler is starting at runtime (e.g. Compilation::compile_java_method, 16% of CPU time), even if any method of the application has not started running yet. (The matrix multiplication call was delayed by the sleep call I inserted.)
The first second is one second, during which the allocation and initialization methods run with various JVM methods, but the matrix multiplication code has not started yet.
Figure 5: The hottest method in the second second. Matrix allocation and initialization compete with JVM startup
Now that the JVM startup and the allocation and initialization of the array have been completed, there is the first repetition of the matrix multiplication code in the third second, as shown in Figure 6. But note that the matrix multiplication code is competing with the Java runtime compiler (e.g. CompileBroker::invoke_compiler_on_method, 8% in Figure 6) which is compiling the methods because the matrix multiplication code is found to be hot.
Even so, the matrix multiplication code (e.g., "include" time in the MxV.main method, 91%) still obtains most of the CPU time. Inclusion time indicates that matrix multiplication (e.g., MxV.multiply) takes 0.100 CPU seconds, which is consistent with the actual time reported by the application in Figure 2. (It takes some actual time to collect and report actual time, which is outside of the CPU time gprofng counts to MxV.multiply.)
Figure 6: The hottest method in the third second, indicating that the runtime compiler is competing with the matrix multiplication method
In this particular example, matrix multiplication does not really compete for CPU time, because the tests run on a multiprocessor system that has a large number of idle cycles and the runtime compiler runs as a separate thread. In more restrictive cases, such as on shared machines with heavy loads, 8% of the time spent by the runtime compiler can be a problem. On the other hand, the time spent in the runtime compiler results in a more efficient method implementation, so if I compute many matrix multiplications, it would be an investment I'm willing to make.
By the fifth second, the matrix multiplication code has the Java virtual machine.
Figure 7: All methods run during the fifth second, indicating that only the matrix multiplication method is active
Please note the 60%/30%/10% allocation between MxV.oneCell, MxV.multiplyAdd and MxV.multiply in terms of exclusive CPU seconds. The MxV.multiplyAdd method only calculates multiplication and addition: but it is the innermost method in matrix multiplication. MxV.oneCell has a loop to call MxV.multiplyAdd. I can see that loop overhead and call (evaluation conditions and control transfer) work relatively more than direct arithmetic operations in MxV.multiplyAdd. (MxV.oneCell has an exclusive time of 0.060 CPU seconds, while MxV.multiplyAdd is 0.030 CPU seconds, reflecting this difference.) The outer loop execution frequency in MxV.multiply is not high enough that the runtime compiler has not yet compiled it. , but the method is using 0.010 CPU seconds.
Matrix multiplication continues to the ninth second, when the JVM runtime compiler starts again and finds that MxV.multiply has become very hot.
In the last repetition, the matrix multiplication code makes full use of the Java virtual machine.
Figure 9: The last repetition of the matrix multiplication program, showing the final configuration of the code
I have shown the ease of easily getting to the runtime of Java applications by using gprofng for performance analysis. Using gprofng's filtering function to check experiments by time slice, I was able to check the program stages of interest. For example, excluding the allocation and initialization phase of the application and only showing one repetition of the program when the runtime compiler is running its magic allows me to highlight the performance that improves as the hot code is gradually compiled.
For readers who want to know about gprofng, here is a blog post with introductory videos about gprofng, including instructions on how to install it on Oracle Linux.
Thanks to Ruud van der Pas, Kurt Goebel and Vladimir Mezentsev for their advice and technical support, and to Elena Zannoni, David Banman, Craig Hardy and Dave Neary for encouraging me to write this blog post.
The above is the detailed content of What Does the Java Virtual Machine Do All Day?. For more information, please follow other related articles on the PHP Chinese website!