Home  >  Article  >  Java  >  Testing the impact of garbage collector GC on throughput in Java

Testing the impact of garbage collector GC on throughput in Java

高洛峰
高洛峰Original
2017-01-17 15:48:261707browse

When I was looking at the glossary of memory management terms, I accidentally discovered the definition of "Pig in the Python (Note: It's a bit like the greedy and insufficient snake swallowing the elephant)" in Chinese, so I came up with this article. On the surface, this term refers to the GC constantly promoting large objects from one generation to another. Doing so is like a python swallowing its prey whole, so that it cannot move while it is digesting.

For the next 24 hours, my mind was filled with images of this suffocating python that I couldn’t get rid of. As psychiatrists say, the best way to relieve fear is to talk it out. Hence this article. But the next story we want to talk about is not python, but GC tuning. I swear to God.

Everyone knows that GC pauses can easily cause performance bottlenecks. Modern JVMs come with advanced garbage collectors when they are released, but from my experience, it is extremely difficult to find the optimal configuration for a certain application. Manual tuning may still have a glimmer of hope, but you have to understand the exact mechanics of the GC algorithm. In this regard, this article will be helpful to you. Below I will use an example to explain how a small change in the JVM configuration affects the throughput of your application.

Example

The application we use to demonstrate the impact of GC on throughput is just a simple program. It contains two threads:

PigEater – It simulates the process of a giant python eating a big fat pig. The code does this by adding 32MB bytes to java.util.List and sleeping for 100ms after each ingestion.
PigDigester – It simulates the process of asynchronous digestion. The code that implements digestion simply sets the list of pigs to empty. Since this is a tiring process, this thread will sleep for 2000ms each time after clearing the reference.
Both threads will run in a while loop, eating and digesting until the snake is full. This would require eating approximately 5,000 pigs.

package eu.plumbr.demo;
public class PigInThePython {
  static volatile List pigs = new ArrayList();
  static volatile int pigsEaten = 0;
  static final int ENOUGH_PIGS = 5000;
  public static void main(String[] args) throws InterruptedException {
    new PigEater().start();
    new PigDigester().start();
  }
  static class PigEater extends Thread {
    @Override
    public void run() {
      while (true) {
        pigs.add(new byte[32 * 1024 * 1024]); //32MB per pig
        if (pigsEaten > ENOUGH_PIGS) return;
        takeANap(100);
      }
    }
  }
  static class PigDigester extends Thread {
    @Override
    public void run() {
      long start = System.currentTimeMillis();
      while (true) {
        takeANap(2000);
        pigsEaten+=pigs.size();
        pigs = new ArrayList();
        if (pigsEaten > ENOUGH_PIGS)  {
          System.out.format("Digested %d pigs in %d ms.%n",pigsEaten, System.currentTimeMillis()-start);
          return;
        }
      }
    }
  }
  static void takeANap(int ms) {
    try {
      Thread.sleep(ms);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Now we define the throughput of this system as "the number of pigs that can be digested per second". Considering that a pig is stuffed into this python every 100ms, we can see that the theoretical maximum throughput of this system can reach 10 pigs/second.

GC Configuration Example

Let’s take a look at the performance of using two different configuration systems. Regardless of configuration, the application runs on a dual-core Mac (OS X10.9.3) with 8GB of RAM.

First configuration:

1.4G heap (-Xms4g -Xmx4g)
2. Use CMS to clean up the old generation (-XX:+UseConcMarkSweepGC) and use the parallel collector to clean up New generation (-XX:+UseParNewGC)
3. Allocate 12.5% ​​of the heap (-Xmn512m) to the new generation, and limit the sizes of the Eden area and the Survivor area to be the same.
The second configuration is slightly different:

1.2G heap (-Xms2g -Xms2g)
2. Both the new generation and the old generation use Parellel GC (-XX:+UseParallelGC)
3. Allocate 75% of the heap to the new generation (-Xmn 1536m)
4. Now it’s time to make a bet, which configuration will perform better (that is, how many pigs can be eaten per second, and Remember)? Those who put their chips on the first configuration, you will be disappointed. The results are just the opposite:

1. The first configuration (large heap, large old generation, CMS GC) can eat 8.2 pigs per second
2. The second configuration (small heap, large The new generation (Parellel GC) can eat 9.2 pigs per second

Now let’s look at this result objectively. The allocated resources are 2 times less but the throughput is increased by 12%. This is contrary to common sense, so it is necessary to further analyze what is going on.

Analyzing GC results

The reason is actually not complicated. You only need to carefully look at what the GC is doing when running the test to find the answer. This is where you choose the tool you want to use. With the help of jstat, I discovered the secret behind it. The command probably looked like this:

jstat -gc -t -h20 PID 1s

By analyzing the data, I noticed that configuration 1 experienced 1129 GC cycles (YGCT_FGCT), taking a total of 63.723 seconds:

Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT
594.0 174720.0 174720.0 163844.1  0.0   174848.0 131074.1 3670016.0  2621693.5  21248.0 2580.9   1006   63.182  116 0.236   63.419
595.0 174720.0 174720.0 163842.1  0.0   174848.0 65538.0  3670016.0  3047677.9  21248.0 2580.9   1008   63.310  117 0.236   63.546
596.1 174720.0 174720.0 98308.0 163842.1 174848.0 163844.2 3670016.0   491772.9  21248.0 2580.9   1010   63.354  118 0.240   63.595
597.0 174720.0 174720.0  0.0   163840.1 174848.0 131074.1 3670016.0   688380.1  21248.0 2580.9   1011   63.482  118 0.240   63.723

The second configuration paused a total of 168 times (YGCT+FGCT) and only took 11.409 seconds.

Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT
539.3 164352.0 164352.0  0.0    0.0   1211904.0 98306.0   524288.0   164352.2  21504.0 2579.2 27    2.969  141 8.441   11.409
540.3 164352.0 164352.0  0.0    0.0   1211904.0 425986.2  524288.0   164352.2  21504.0 2579.2 27    2.969  141 8.441   11.409
541.4 164352.0 164352.0  0.0    0.0   1211904.0 720900.4  524288.0   164352.2  21504.0 2579.2 27    2.969  141 8.441   11.409
542.3 164352.0 164352.0  0.0 0.0   1211904.0 1015812.6  524288.0   164352.2  21504.0 2579.2 27 2.969  141 8.441   11.409

Considering that the workload in both cases is equal, therefore - in this pig-eating experiment, when the GC does not find long-lived objects, it can clean up garbage objects faster. With the first configuration, the frequency of GC operation will be about 6 to 7 times, and the total pause time will be 5 to 6 times.

Telling this story has two purposes. First and most importantly, I wanted to get this convulsing python out of my mind. Another more obvious gain is that GC tuning is a very skillful experience, and it requires you to have a thorough understanding of the underlying concepts. Although the one used in this article is just a very common application, the different results of the selection will also have a great impact on your throughput and capacity planning. In real-life applications, the difference here will be even greater. So it's up to you, you can master these concepts, or you can just focus on your daily work and let Plumbr figure out the most suitable GC configuration for your needs.

For more articles related to testing the impact of garbage collector GC on throughput in Java, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn