Home >Java >javaTutorial >The life history of Java programs

The life history of Java programs

怪我咯Original: 2017-04-05 16:45:031403browse

As a programmer, we write Code every day, but do you really understand its life cycle? Today, let’s briefly talk about its life history. Speaking of a piece of Java Code, from birth to game over, it can be roughly divided into the following steps: compilation, class loading, running, and GC.

Compilation

The compilation period of Java language is actually an "uncertain" process, because it may be a front-end compiler The process of converting .java files into .class files; it may also refer to the process of converting bytecode into machine code by the JVM's back-end runtime compiler (JIT compiler); it may also refer to the use of static advance The compiler (AOT compiler) directly compiles the .java file into local machine code. But here we are talking about the first category. It is also in line with our public understanding of compilation. What processes did the compilation go through during this period?

Lexical and syntactic analysis

Lexical analysis is the process of converting the character stream of the source code into a Token set, while syntactic analysis is the process of abstractly constructing a syntax tree (ATS) based on the Token sequence. ATS is A tree representation used to describe the grammatical structure of program code. Each node of the syntax tree represents a grammatical structure in the program code, such as packages, types, modifiers, operators,Interface, return value and even codeComments can be a syntax structure.

Filling the symbol table

After completing the syntax and lexical analysis, the next step is the process of filling the symbol table. The information registered in the symbol table will be used at different stages of compilation. Let’s extend the concept of symbol table here. What is a symbol table? It is a table composed of a set of symbol addresses and symbol information. The simplest can be understood as the form of K-V value pairs of a hash table. Why are symbol tables used? One of the earliest applications of symbol tables was to organize information about program code. Initially, computer programs were just simple numbers, but programmers soon discovered that it was much more convenient to use symbols to represent operations and memory addresses. Associating names and numbers requires a symbol table. As programs grow, the performance of symbol table operations gradually becomes a bottleneck for program development efficiency. For this reason, many data structures and algorithms have been born to improve the efficiency of sequence number tables. As for the so-called data structures and algorithms, what are they? Generally speaking: sequential search in unordered linked lists, binary search in ordered arrays, binary search trees, balanced search trees (here we mainly come into contact with red-black trees), hash tables (hash based on zipper method) lists, hash tables based on linear probing). Like java.util.TreeMap and java.util.HashMap in Java, they are implemented based on the symbol tables of red-black trees and zipper hash tables respectively. The concept of the symbol table mentioned here will not be explained in detail. Those who are interested can find relevant information. Semantic Analysis

After the previous two steps, we obtained the abstract syntax tree representation of the program code. The syntax tree can represent a correct source code abstraction, but there is no guarantee that the source program is logical. Yes, this is when semantic analysis comes on the scene. Its main task is to review the context-sensitive nature of the structurally correct source program. Annotation checking, data and control flow analysis, and decoding syntactic sugar are several steps in the semantic analysis stage. Here, we will discuss the concept of syntactic sugar in detail. Syntactic sugar refers to a certain syntax added to a computer language. This syntax has no impact on the functionality of the language, but is more convenient for programmers to use. The most commonly used syntactic sugars in Java are generics, variable-length parameters, self-boxing/unboxing, and traversal

Loop

. The JVM does not support these syntaxes at runtime and they return to simple basics during the compilation phase. Grammatical structure, this process is to solve the syntax sugar. To give an example of generic erasure, List and List will be generically erased after compilation and become the same native type List.

　Bytecode generation

Bytecode generation is the last stage of the Javac compilation process. At this stage, the information generated in the previous steps will be converted into bytecode and written to the disk. It will also be A small amount of code addition and conversion work was done. Instance constructor () method and class constructor () method (the instance constructor here does not refer to the default constructor , if the user code does not provide any constructor, then The compiler will add a default constructor with no parameters and the same accessibility as the current class. This work has been completed during the filling symbol table stage, and the class constructor () method refers to the compiler automatically collecting the class All class variable assignment actions and statements in static statement blocks are merged into the syntax tree at this stage. At this point the entire compilation process ends.

Class Loading

Compilation After compiling the program into bytecode, the next step is the process of loading classes into memory.

The class loading process is carried out in the method area of the virtual machine memory, which involves the virtual machine memory, so here we first briefly introduce the concept of program distribution in the memory area. The virtual machine memory area is divided into: program counter, stack, local method stack, heap, method area (some areas are runtime constant pools), and direct memory.

　Program Counter

The program counter is a small memory space. It can be regarded as a line number indicator of the bytecode executed by the current thread. In the JVM concept model, the bytecode interpreter works by changing the value of this counter to select the next bytecode instruction that needs to be executed.

Stack

The stack is used to store local variable tables, operand stacks, dynamic links, method exits and other information. The local variable table stores various basic data types and objects references that are suppressed during compilation. Like the program counter, it is thread-private.

Local method stack

The local method stack is similar to the virtual machine stack introduced above. Their difference is that the virtual machine stack serves the virtual machine to execute Java methods (bytecode), and The local method stack serves the Native methods used by the virtual machine, and some virtual machines even combine the two into one.

Heap

The heap is the largest piece of memory managed by the JVM. It is an area shared by all threads. Its only purpose is to store object instances. Almost all object instances allocate memory here (like special class objects, memory is allocated in the method area). This place is also the main area for garbage collection management. From the perspective of memory recycling, garbage collectors now use generational collection algorithms (will be introduced in detail later), so the Java heap can be further subdivided: the new generation and the old generation, and the new generation The generation is further subdivided into: Eden space, From Survivor space, and To Survivor space. For efficiency reasons, the heap may also be divided into multiple thread-private allocation buffers (TLAB). No matter how it is divided, it has nothing to do with the storage content. No matter which area, object instances are still stored. The purpose of their existence is only to better recycle and allocate memory.

Method area

The method area, like the heap, is a memory area shared by threads. It is used to store class information, constants, static variables, and just-in-time compiler compilation that have been loaded by the virtual machine. The code and other data after. The runtime constant pool is part of the method area. It is mainly used to store various literals and symbol references declared at compile time.

Direct memory

Direct memory is not part of the virtual machine runtime data area. It is also a memory area not defined in the Java specification. You can simply understand it as off-heap memory. Memory allocation is not affected by Java heap size is limited but is limited by the entire memory size.

After talking about the concept of virtual machine memory area, let’s get back to the topic. What is the class loading process? Five steps: loading, verification, preparation, parsing, and initialization. Loading, verification, preparation, and initialization are executed sequentially, but parsing is not necessarily the case. It may be executed after initialization.

Loading

During the loading phase, the JVM needs to complete three steps: first, obtain the binary byte stream that defines this class through the fully qualified name of the class, and then convert the byte stream represented by this The static storage structure is converted into the runtime data structure of the method area, and finally a java.lang.Class object representing this class is generated in the memory, which serves as various data entries for this class in the method area. In the first step of obtaining the binary byte stream, it is not clearly stated that it should be obtained from a *.class file. The flexibility of the regulations allows us to obtain it from the ZIP (providing the basis for JAR, EAR/WAR formats) package, and obtain it from the network. (Applet), calculated and generated at runtime (dynamic proxy), other files generated (Class class generated by JSP file), obtained from the database.

Verification

Verification, as the name suggests, is actually to ensure that the information contained in the Class file byte stream meets the requirements of the JVM, because the source of the Class file is not necessarily generated from the compiler, and may also be generated using HexadecimalEditorWrite Class files directly. The verification process includes file format verification, metadata verification, and bytecode verification. The specific security verification methods here will not be detailed here.

　Preparation

The preparation stage is the stage where memory is formally allocated for class variables and initial values are set. The memory used by these variables is allocated in the method area.

Parsing

The parsing phase is the process in which the JVM replaces the symbol reference in the constant pool with a direct reference (a pointer to the target, a relative offset or a handle). The compilation filling we talked about earlier The value of the symbol table is reflected here. The parsing process is nothing more than parsing classes or interfaces, fields, and interface methods.

Initialization

The class initialization phase is the last step in the class loading process. In the preparation phase, the variables have been assigned an initial value, and in this step, it will be carried out according to the requirements customized by the programmer. Initialize class variables and other resources. At this stage, it is the process of executing the () method mentioned in the previous compiled bytecode generation process. The virtual machine also ensures that this method is correctly locked and synchronized when called simultaneously in a multi-threaded environment, ensuring that only one thread executes this method while other threads block and wait. The author previously wrote an article "From a Simple Java In Singleton Example Talking about Concurrency, the thread-safe writing method of singleton based on class initialization is related to this. If you are interested, you can combine it and take a look. This place also involves another knowledge point that we are more concerned about. When does Java trigger the initialization operation of a class?

When encountering the four bytecode instructions of new, getstatic, putstatic or invokestatic, if the class has not been initialized, its initialization needs to be triggered. What are the various fork instructions in front of it? A simple understanding is when you create a new object, when you read or set a static field of a class, or when you call a static method of a class.
When using the method of the java.lang.reflect package to make a reflective call to a class, if the class is not initialized, its initialization needs to be triggered.
When initializing a class and finding that its parent class has not been initialized, the initialization operation of its parent class will be triggered first.
When the virtual machine starts, the user needs to specify a main class to be executed (the class where the main method is located), and the virtual machine first initializes the main class.
When using dynamic language support above JDK1.7, if the final parsing result of a java.lang.invoke.MethodHandle instance is the method handle of REF_getStatic, REF_putStatic, REF_invokeStatic, and this If the class corresponding to the method handle has not been initialized, the initialization operation will be triggered.

Run

After the above two stages, the program begins to run normally. We all know that the program execution process involves the calculation operations of various instructions. How does the program What about execution? This is where the back-end compiler (JIT just-in-time compiler) + interpreter mentioned at the beginning of the article will be used (the HotSpot virtual machine uses an interpreter and a compiler by default), and bytecode execution The engine is responsible for the tasks of various program calculation operations. When executing Java code, it may have two options: interpreted execution (executed through an interpreter) and compiled execution (local code generated through a just-in-time compiler). Or maybe both. Stack frame is a data structure used to support method calling and execution of virtual machines. The specific calculation ideas for stack pushing and popping various instructions involve a classic algorithm-Dijkstra algorithm. As for how to execute it, if you are interested, check the information yourself. This place doesn't go too deep. Runtime optimization issues are equally important at this stage, and the JVM design team has concentrated performance optimization at this stage, so that Class files not generated by Javac can also enjoy the benefits of compiler optimization. As for the specifics What are the optimization techniques? There are many, here are a few representative optimization techniques: common subexpressionelimination, array bounds check elimination, method inlining, escape analysis, etc.

　GC

Finally, it is said that the program is entering the death stage. How does the JVM determine program pills? This place actually uses a reachability analysis algorithm. The basic idea of this algorithm is to use a series of objects called "GC Roots" as the starting point, and search downward from this node. The path traveled by the search is called a reference. Chain, when there is no reference chain connecting an object to GC Roots (in graph theory terms, the object is unreachable from GC Roots), it proves that the object is unavailable, and it is determined to be a recyclable object. When do we trigger garbage collection when we already know the objects to be recycled? Safety points are places where the program is temporarily executed to perform GC. From this, we can easily know that the GC pause time is the core of garbage collection. All garbage collection algorithms and derived garbage collectors are all centered around minimizing GC pause times. Now the latest G1 garbage collector can establish a predictable pause time model and plan to avoid full operations in the entire Java heap. Regional garbage collection. When we introduced the concept of memory area distribution earlier, we talked about the new generation and the old generation. Different garbage collectors may act on the new generation or the old generation, and there is even no concept of generation (such as the G1 collector). ), having said that, the following is a detailed introduction to the garbage collection algorithm and the corresponding garbage collector

Mark-clear algorithm

The most basic collection algorithm, the algorithm is divided into two types: mark and clear Stage: First mark all objects to be recycled. After the marking is completed, all marked objects will be recycled uniformly. Its biggest shortcoming is that it is not efficient and generates a large number of discontinuous memory fragments. This causes problems when the program allocates large objects during running. Even if there is enough memory in the heap, it cannot find enough continuous memory. May have to trigger a GC operation. The corresponding garbage collector here is the CMS collector.

Copy algorithm

Copy algorithm is born to solve efficiency problems. It can divide the available memory capacity into two equal-sized blocks, and only use one of them at a time. When this block When the memory runs out, copy the objects that are still alive to another block, and then clean up the used memory space at once. In this way, GC will be performed on the entire half area every time, and problems such as memory fragmentation will not occur. Most of today's commercial virtual machines use this algorithm to recycle the new generation. In addition, the memory division ratio is not 1:1. For example, the default ratio of Eden (one Eden area) and Survivor (two Survivor areas) in HotSpot is 8:1. Each time Eden and one of the Surviovr areas are used, that is, the available memory space in the new generation is 90% of the entire new generation. When recycling, copy the surviving objects in Eden and one of the Survivors to another Survivor at one time. Finally, clean up Eden and the Survivor space just used. Careful readers may find out here, what if the unused Survivor space during the copy process is not enough? At this time, you need to rely on the old generation for allocation guarantee. If the guarantee is successful, Eden and one of the surviving objects in the Survivor will be moved to the old generation. If the guarantee fails, a garbage collection will have to be triggered in the old generation. To extend this point, the new generation garbage collection is called Minor GC. Because most Java objects are born and die, Minor GC is very frequent and the recovery speed is generally fast. The old generation garbage collection is called Major GC/Full GC. Major The speed of GC is generally much slower than that of Minor GC. From the previous analysis process, we can easily infer that the occurrence of Major GC is often accompanied by a Minor GC, but it is not absolute. Therefore, the purpose of our GC is actually to adjust the speed of GC. It is best to control and reduce the frequency of Major GC as much as possible. The corresponding garbage collectors here are the Serial collector, the ParNew collector (a multi-threaded version of the Serial collector, which can work with the old generation collector CMS mentioned later), and the Parallel Scavenge collector.

　Mark-Complete Algorithm

This algorithm is an algorithm used for garbage collection in the old generation, because the old generation is not recycled as frequently as the copy algorithm, and it also wastes space. The mark-organize process is similar to mark-clear, except that the subsequent steps are not to directly clear the recyclable objects, but to move all surviving objects to one end, and then directly clean up the memory outside the end boundary. The corresponding garbage collectors here are Serial Old collector and Parallel Old collector.

Generational collection algorithm

Current commercial virtual machines all use this algorithm. Its idea is to divide the heap memory area into generations as we mentioned earlier. The new generation and the old generation are different. Regions use different garbage collection algorithms. The young generation uses the copy algorithm, and the old generation uses the mark-collation or mark-sweep algorithm.

Review

Having said so much before, maybe you have some idea of the life history of Java Code, or you don’t understand it very well. Here we give an example to review the whole process. What will we experience when we create a new object? Combined with what was said before, when the JVM encounters a new instruction, it first checks whether the entire instruction parameter can locate a symbol reference of a class in the constant pool in the method area, and checks whether the class represented by the entire symbol reference has been loaded and parsed. and initialized, if not, the corresponding class loading process must be executed first. After the class loading check passes, the JVM will next allocate memory for the new object. This process is performed in the heap. The allocation size can be determined after the class loading is completed. If the heap memory is regular, the pointer is used to move the object size. Equal distance is enough. This allocation method is called "pointer collision". If it is scattered, the JVM maintains a list to record which memory is available, allocates and updates the list records, this method is called "free list", as for which method is used , depends on which garbage collector is used for the heap we mentioned earlier. After dividing the object memory, the virtual machine performs necessary initialization operations. Next, the necessary settings need to be made for the object. This information is set in the object header (class metadata information, object hash code, object GC generation age, etc. ), after these tasks are completed, a new object is generated. This is actually not over yet. The next step is to call the () method to perform the assignment operation on the object field planned by the programmer, and finally set the stack The reference points to the memory address where the object in the heap is located (direct reference). At this time, a truly usable object has been generated. As for the various subsequent operations on the object and its final death, it is the bytecode execution engine mentioned earlier. Ah GC, I believe everyone is no longer unfamiliar with it.

The above is the detailed content of The life history of Java programs. For more information, please follow other related articles on the PHP Chinese website!

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Detailed explanation of Java regular expression APINext article：Detailed explanation of Java regular expression API

See more

The life history of Java programs

Compilation

Lexical and syntactic analysis

Filling the symbol table

Bytecode generation

Class Loading

Program Counter

Stack

Local method stack

Heap

Method area

Direct memory

Loading

Verification

Preparation

Parsing

Initialization

Run

GC

Mark-clear algorithm

Copy algorithm

Review

Related articles

　Bytecode generation

　Program Counter

　Preparation

　GC