Tip #1: Predict the capacity of the collection
All standard Java collections, including custom and extended implementations (such as Trove and Google's Guava), use arrays (either native data types or object-based types) under the hood. Because once an array is allocated, its size is immutable, so adding elements to the collection will in most cases result in the need to re-apply for a new large-capacity array to replace the old array (referring to the array used by the underlying implementation of the collection).
Even if a collection initialization size is not provided, most collection implementations try to optimize the processing of reallocating the array and amortize its overhead to a minimum. However, best results can be obtained by providing the size when constructing the collection.
Let's analyze the following code as a simple example:
public static List reverse(List & lt; ? extends T & gt; list) {
List result = new ArrayList();
for (int i = list.size() - 1; i & gt; = 0; i--) {
result.add(list.get(i));
}
return result;
}
This method allocates a new array, then fills it up with items from another list, only in reverse order. changed.
This processing method may pay a heavy performance cost, and the optimization point is the line of code that adds elements to a new list. As each element is added, the list needs to ensure that its underlying array has enough space to accommodate the new element. If there is a free slot, the new element is simply stored in the next free slot. If not, a new underlying array is allocated, the old array contents are copied to the new array, and the new elements are added. This will cause the array to be allocated multiple times, and those remaining old arrays will eventually be reclaimed by the GC.
We can avoid these redundant allocations by letting the underlying array know how many elements it will store when constructing the collection
public static List reverse(List & lt; ? extends T & gt; list) {
List result = new ArrayList(list.size());
for (int i = list.size() - 1; i & gt; = 0; i--) {
result.add(list.get(i));
}
return result;
}
The above code specifies a large enough space to store list.size() elements through the constructor of ArrayList, and completes the allocation during initialization, which means that List does not need to allocate memory again during the iteration process.
Guava's collection class goes a step further, allowing you to explicitly specify the number of expected elements or specify a predicted value when initializing the collection.
1
2List result = Lists.newArrayListWithCapacity(list.size());
List result = Lists.newArrayListWithExpectedSize(list.size());
In the above code, the former is used when we already know exactly how many elements the collection will store, while the latter is allocated in a way that takes into account incorrect estimates.
Tip #2: Process the data stream directly
When dealing with data streams, such as reading data from a file or downloading data from the network, the following code is very common:
1byte[] fileData = readFileToByteArray(new File("myfile.txt"));
The resulting byte array may be parsed as an XML document, a JSON object, or a protocol buffered message, with some common options available.
The above approach is unwise when dealing with large files or files with unpredictable sizes, because OutOfMemoryErrors will result when the JVM cannot allocate a buffer to process the real file.
Even if the size of the data is manageable, using the above pattern will still cause huge overhead when it comes to garbage collection because it allocates a very large area in the heap to store the file data.
A better way to handle this is to use a suitable InputStream (such as FileInputStream in this example) to pass directly to the parser, instead of reading the entire file into a byte array at once. All mainstream open source libraries provide corresponding APIs to directly accept an input stream for processing, such as:
FileInputStream fis = new FileInputStream(fileName);
MyProtoBufMessage msg = MyProtoBufMessage.parseFrom(fis);
Tip #3: Use immutable objects
Immutability has so many benefits. I don’t even need to go into details. However, there is one advantage that has an impact on garbage collection that should be looked at.
The properties of an immutable object cannot be modified after the object is created (the example here uses properties of reference data types), such as:
public class ObjectPair {
private final Object first;
private final Object second;
public ObjectPair(Object first, Object second) {
this.first = first;
this.second = second;
}
public Object getFirst() {
return first;
}
public Object getSecond() {
return second;
}
}
Instantiating the above class will produce an immutable object - all its properties are modified with final and cannot be changed after the construction is completed.
Immutability means that all objects referenced by an immutable container are created before the container is constructed. As far as GC is concerned: the container is at least as young as the youngest reference it holds. This means that when performing garbage collection in the young generation, the GC skips immutable objects because they are in the old generation. It does not complete the collection of immutable objects until it is sure that these immutable objects are not referenced by any objects in the old generation. Recycle.
Fewer scan objects means fewer scans of memory pages, which means shorter GC lifetimes, which means shorter GC pauses and better overall throughput.
Tip #4: Be careful with string concatenation
Strings are probably the most commonly used non-native data structure in all JVM-based applications. However, due to its implicit overhead and ease of use, it is very easy to become the culprit of taking up a lot of memory.
The problem is obviously not with the string literal, but with the initialization of allocated memory at runtime. Let's take a quick look at an example of dynamically building a string:
public static String toString(T[] array) {
String result = "[";
for (int i = 0; i & lt; array.length; i ) {
result = (array[i] == array ? "this" : array[i]);
if (i & lt; array.length - 1) {
result = ", ";
}
}
result = "]";
return result;
}
This is a seemingly good method that takes an array of characters and returns a string. But this is disastrous for object memory allocation.
It’s hard to see behind this syntax sugar, but the actual situation behind the scenes is this:
public static String toString(T[] array) {
String result = "[";
for (int i = 0; i & lt; array.length; i ) {
StringBuilder sb1 = new StringBuilder(result);
sb1.append(array[i] == array ? "this" : array[i]);
result = sb1.toString();
if (i & lt; array.length - 1) {
StringBuilder sb2 = new StringBuilder(result);
sb2.append(", ");
result = sb2.toString();
}
}
StringBuilder sb3 = new StringBuilder(result);
sb3.append("]");
result = sb3.toString();
return result;
}
Strings are immutable, which means that each time concatenation occurs, they themselves are not modified, but new strings are allocated in turn. Additionally, the compiler uses the standard StringBuilder class to perform these concatenation operations. This is problematic because each iteration implicitly allocates a temporary string and a temporary StringBuilder object to help build the final result.
The best way is to avoid the above situation and use StringBuilder and direct append instead of the native concatenation operator (" "). Below is an example:
public static String toString(T[] array) {
StringBuilder sb = new StringBuilder("[");
for (int i = 0; i & lt; array.length; i ) {
sb.append(array[i] == array ? "this" : array[i]);
if (i & lt; array.length - 1) {
sb.append(", ");
}
}
sb.append("]");
return sb.toString();
}
Here, we allocate the only StringBuilder at the beginning of the method. At this point, all strings and list elements have been appended to a single StringBuilder. Finally, use the toString() method to convert it into a string and return it in one go.
Tip #5: Use collections of specific native types
Java's standard collection library is simple and supports generics, allowing semi-static binding of types when using collections. For example, if you want to create a Set that only stores strings or a map that stores Map
The real problem arises when we want to use a list to store int type, or a map to store double type as value. Because generics do not support native data types, another option is to use a wrapper type instead, here we use List .
This processing method is very wasteful, because an Integer is a complete object. The header of an object occupies 12 bytes and the int properties maintained inside it. Each Integer object occupies a total of 16 bytes. This consumes four times as much space as a list of int types storing the same number of items! A more serious problem than this is the fact that because Integer is a real object instance, it needs to be considered by the garbage collector during the garbage collection phase for recycling.
To handle this, we use the awesome Trove collection library in Takipi. Trove abandons some generic specificities in favor of specialized collections of native types that are more memory efficient. For example, we use the very performance-consuming Map
TIntDoubleMap map = new TIntDoubleHashMap();
map.put(5, 7.0);
map.put(-1, 9.999);
...
The underlying implementation of Trove uses arrays of native types, so when operating collections, boxing (int->Integer) or unboxing (Integer->int) of elements will not occur, and no objects are stored, because the underlying implementation uses native Data type storage.
The above is the detailed content of How to reduce Java garbage collection overhead. For more information, please follow other related articles on the PHP Chinese website!