Home >Java >javaTutorial >Detailed introduction to Java virtual machine architecture

Detailed introduction to Java virtual machine architecture

零下一度
零下一度Original
2017-06-25 13:33:571262browse

Life cycle of JAVA virtual machine

The bounden duty of a runtime Java virtual machine instance is: responsible for running a java program. When a Java program is started, a virtual machine instance is born. When the program is closed and exited, the virtual machine instance will also die. If three Java programs are run simultaneously on the same computer, three Java virtual machine instances will be obtained. Each Java program runs in its own Java virtual machine instance.

A Java virtual machine instance runs a Java program by calling the main() method of an initial class. The main() method must be public, static, return void, and accept a string array as a parameter. Any class with such a main() method can be used as the starting point for running a Java program.

public class Test {public static void main(String[] args) {// TODO Auto-generated method stub
        System.out.println("Hello World");
    }

}

In the above example, the main() method in the initial class of the Java program will be the starting point of the initial thread of the program, and any other threads are started by this initial thread.

There are two types of threads inside the Java virtual machine: daemon threads and non-daemon threads. Daemon threads are usually used by the virtual machine itself, such as threads that perform garbage collection tasks. However, a Java program can also mark any thread it creates as a daemon thread. The initial thread in the Java program - the one that starts in main(), is a non-daemon thread.

As long as there are any non-daemon threads running, the Java program will continue to run. When all non-daemon threads in the program terminate, the virtual machine instance will automatically exit. If the security manager allows it, the program itself can also exit by calling the exit() method of the Runtime class or System class.

JAVA virtual machine architecture

The following figure is the structure diagram of the JAVA virtual machine. Each Java virtual machine has a class loading subsystem, which is based on the given fully qualified name. Load type (class or interface). Similarly, each Java virtual machine has an execution engine, which is responsible for executing instructions contained in the methods of the loaded class.

 

When the JAVA virtual machine runs a program, it requires memory to store many things, such as bytecode, other information obtained from loaded class files, Objects created by the program, parameters passed to methods, return values, local variables, etc. The Java virtual machine organizes these things into several "runtime data areas" for easy management.

Some runtime data areas are shared by all threads in the program, while others can only be owned by one thread. Each Java virtual machine instance has a method area and a heap, which are shared by all threads in the virtual machine instance. When a virtual machine loads a class file, it parses the type information from the binary data contained in the class file. Then put this type information into the method area. When the program is running, the virtual machine puts all objects created by the program while it is running into the heap.

 

When each new thread is created, it will get its own PC register (program counter) and a Java stack. If the thread is executing a Java method (non-native method), then the value of the PC register will always point to the next instruction to be executed, and its Java stack will always store the status of the Java method call in the thread - including its local variables, which are The parameters passed in when calling, the return value, and the intermediate results of the operation, etc. The status of local method calls is stored in the local method stack in a method that depends on the specific implementation, or it may be in a register or some other memory area related to a specific implementation.

The Java stack is composed of many stack frames. A stack frame contains the status of a Java method call. When a thread calls a Java method, the virtual machine pushes a new stack frame into the thread's Java stack. When the method returns, the stack frame is popped from the Java stack and discarded.

The Java virtual machine has no registers, and its instruction set uses the Java stack to store intermediate data. The reason for this design is to keep the Java virtual machine's instruction set as compact as possible and to facilitate the implementation of the Java virtual machine on platforms with few general-purpose registers. In addition, the stack-based architecture of the Java virtual machine also helps code optimization of dynamic compilers and just-in-time compilers implemented by some virtual machines during runtime.

The following figure depicts the memory area created by the Java virtual machine for each thread. These memory areas are private and no thread can access the PC register or Java stack of another thread.

 

The above figure shows a snapshot of a virtual machine instance with three threads executing. Both thread 1 and thread 2 are executing Java methods, while thread 3 is executing a native method.

  Java栈都是向下生长的,而栈顶都显示在图的底部。当前正在执行的方法的栈帧则以浅色表示,对于一个正在运行Java方法的线程而言,它的PC寄存器总是指向下一条将被执行的指令。比如线程1和线程2都是以浅色显示的,由于线程3当前正在执行一个本地方法,因此,它的PC寄存器——以深色显示的那个,其值是不确定的。

 数据类型

  Java虚拟机是通过某些数据类型来执行计算的,数据类型可以分为两种:基本类型和引用类型,基本类型的变量持有原始值,而引用类型的变量持有引用值。

  

  Java语言中的所有基本类型同样也都是Java虚拟机中的基本类型。但是boolean有点特别,虽然Java虚拟机也把boolean看做基本类型,但是指令集对boolean只有很有限的支持,当编译器把Java源代码编译为字节码时,它会用int或者byte来表示boolean。在Java虚拟机中,false是由整数零来表示的,所有非零整数都表示true,涉及boolean值的操作则会使用int。另外,boolean数组是当做byte数组来访问的,但是在“堆”区,它也可以被表示为位域。

  Java虚拟机还有一个只在内部使用的基本类型:returnAddress,Java程序员不能使用这个类型,这个基本类型被用来实现Java程序中的finally子句。该类型是jsr, ret以及jsr_w指令需要使用到的,它的值是JVM指令的操作码的指针。returnAddress类型不是简单意义上的数值,不属于任何一种基本类型,并且它的值是不能被运行中的程序所修改的。

  Java虚拟机的引用类型被统称为“引用(reference)”,有三种引用类型:类类型、接口类型、以及数组类型,它们的值都是对动态创建对象的引用。类类型的值是对类实例的引用;数组类型的值是对数组对象的引用,在Java虚拟机中,数组是个真正的对象;而接口类型的值,则是对实现了该接口的某个类实例的引用。还有一种特殊的引用值是null,它表示该引用变量没有引用任何对象。

  JAVA中方法参数的引用传递

  java中参数的传递有两种,分别是按值传递和按引用传递。按值传递不必多说,下面就说一下按引用传递。

  “当一个对象被当作参数传递到一个方法”,这就是所谓的按引用传递。

public class User {    private String name;public String getName() {return name;
    }public void setName(String name) {this.name = name;
    }
    
}
public class Test {    public void set(User user){
        user.setName("hello world");
    }    public static void main(String[] args) {
        
        Test test = new Test();
        User user = new User();
        test.set(user);
        System.out.println(user.getName());
    }
}

  上面代码的输出结果是“hello world”,这不必多说,那如果将set方法改为如下,结果会是多少呢?

public void set(User user){
        user.setName("hello world");
        user = new User();
        user.setName("change");
    }

  答案依然是“hello world”,下面就让我们来分析一下如上代码。

  首先

User user = new User();

  是在堆中创建了一个对象,并在栈中创建了一个引用,此引用指向该对象,如下图:

 

test.set(user);

  是将引用user作为参数传递到set方法,注意:这里传递的并不是引用本身,而是一个引用的拷贝。也就是说这时有两个引用(引用和引用的拷贝)同时指向堆中的对象,如下图:

 

user.setName("hello world");

  在set()方法中,“user引用的拷贝”操作堆中的User对象,给name属性设置字符串"hello world"。如下图:

  

 

user = new User();

  在set()方法中,又创建了一个User对象,并将“user引用的拷贝”指向这个在堆中新创建的对象,如下图:

  

user.setName("change");

  在set()方法中,“user引用的拷贝”操作的是堆中新创建的User对象。

 

  set()方法执行完毕,目光再回到mian()方法

System.out.println(user.getName());

  因为之前,"user引用的拷贝"已经将堆中的User对象的name属性设置为了"hello world",所以当main()方法中的user调用getName()时,打印的结果就是"hello world"。如下图:

  

Class loading subsystem

In the JAVA virtual machine, the part responsible for finding and loading types is called the class loading subsystem.

The JAVA virtual machine has two class loaders: startup class loader and user-defined class loader. The former is part of the JAVA virtual machine implementation, and the latter is part of the Java program. Classes loaded by different class loaders will be placed in different namespaces inside the virtual machine.

The class loader subsystem involves several other components of the Java virtual machine, as well as several classes from the java.lang library. For example, a user-defined class loader is an ordinary Java object, and its class must be derived from the java.lang.ClassLoader class. The methods defined in ClassLoader provide an interface for programs to access the class loader mechanism. In addition, for each loaded type, the JAVA virtual machine will create an instance of the java.lang.Class class to represent the type. Like all other objects, user-defined class loaders and instances of the Class class are placed in the heap area in memory, and the loaded type information is located in the method area.

In addition to locating and importing binary class files, the class loader subsystem must also be responsible for verifying the correctness of the imported class, allocating and initializing memory for class variables, and helping to resolve symbol references. These actions must be performed strictly in the following order:

(1) Loading - Find and load the binary data of the type.

 (2) Connection - points to verification, preparation, and parsing (optional).

 ● Verification Ensure the correctness of the imported type.

 ● Preparation Allocate memory for class variables and initialize them to default values.

 ● Parsing Convert the symbol reference in the type into a direct reference.

  (3) Initialization - Initialize class variables to the correct initial value.

Every JAVA virtual machine implementation must have a startup class loader that knows how to load trusted classes.

Each class loader has its own namespace, which maintains the types loaded by it. So a Java program can load multiple types with the same fully qualified name multiple times. Such a type's fully qualified name is not sufficient to determine uniqueness within a Java virtual machine. Therefore, when multiple class loaders load a type with the same name, in order to uniquely identify the type, the class loader ID that loads the type (indicating the namespace in which it is located) must be preceded by the type name.

Method Area

In the Java virtual machine, information about the type being loaded is stored in a memory logically called the method area. When the virtual machine loads a certain type, it uses the class loader to locate the corresponding class file, and then reads the class file - a linear binary data stream, then transmits it to the virtual machine, and then the virtual machine extracts the Type information and store this information in the method area. Class (static) variables in this type are also stored in the method area.

How the JAVA virtual machine stores type information internally is determined by the designer of the specific implementation.

When the virtual machine runs a Java program, it looks for and uses the type information stored in the method area. Since all threads share the method area, their access to method area data must be designed to be thread-safe. For example, assuming that two threads are trying to access a class named Lava at the same time, and this class has not been loaded into the virtual machine, then only one thread should load it at this time, while the other thread can only wait. .

For each loaded type, the virtual machine will store the following type information in the method area:

● The fully qualified name of this type

 ●The fully qualified name of the direct superclass of this type (unless this type is java.lang.Object, which has no superclass)

 ●Is this type a class type or interface? Type

● The access modifier of this type (a subset of public, abstract, or final)

● The full set of any direct superinterface An ordered list of qualified names

In addition to the basic type information listed above, the virtual machine must also store the following information for each loaded type:

● Constant pool of this type

 ●Field information

 ●Method information

 ●In addition to constants All class (static) variables except

● A reference to the class ClassLoader

● A reference to the Class class

 Constant pool

The virtual machine must maintain a constant pool for each loaded type. A constant pool is an ordered collection of constants used by the type, including direct constants and symbolic references to other types, fields, and methods. Data items in a pool are accessed by index just like an array. Because the constant pool stores symbolic references to all types, fields, and methods used by the corresponding type, it plays a core role in the dynamic linking of Java programs.

  字段信息

  对于类型中声明的每一个字段。方法区中必须保存下面的信息。除此之外,这些字段在类或者接口中的声明顺序也必须保存。

  ○ 字段名

  ○ 字段的类型

  ○ 字段的修饰符(public、private、protected、static、final、volatile、transient的某个子集)

  方法信息

  对于类型中声明的每一个方法,方法区中必须保存下面的信息。和字段一样,这些方法在类或者接口中的声明顺序也必须保存。

  ○ 方法名

  ○ 方法的返回类型(或void)

  ○ 方法参数的数量和类型(按声明顺序)

  ○ 方法的修饰符(public、private、protected、static、final、synchronized、native、abstract的某个子集)

  除了上面清单中列出的条目之外,如果某个方法不是抽象的和本地的,它还必须保存下列信息:

  ○ 方法的字节码(bytecodes)

  ○ 操作数栈和该方法的栈帧中的局部变量区的大小

  ○ 异常表

  类(静态)变量

  类变量是由所有类实例共享的,但是即使没有任何类实例,它也可以被访问。这些变量只与类有关——而非类的实例,因此它们总是作为类型信息的一部分而存储在方法区。除了在类中声明的编译时常量外,虚拟机在使用某个类之前,必须在方法区中为这些类变量分配空间。

  而编译时常量(就是那些用final声明以及用编译时已知的值初始化的类变量)则和一般的类变量处理方式不同,每个使用编译时常量的类型都会复制它的所有常量到自己的常量池中,或嵌入到它的字节码流中。作为常量池或字节码流的一部分,编译时常量保存在方法区中——就和一般的类变量一样。但是当一般的类变量作为声明它们的类型的一部分数据面保存的时候,编译时常量作为使用它们的类型的一部分而保存。

  指向ClassLoader类的引用

  每个类型被装载的时候,虚拟机必须跟踪它是由启动类装载器还是由用户自定义类装载器装载的。如果是用户自定义类装载器装载的,那么虚拟机必须在类型信息中存储对该装载器的引用。这是作为方法表中的类型数据的一部分保存的。

  虚拟机会在动态连接期间使用这个信息。当某个类型引用另一个类型的时候,虚拟机会请求装载发起引用类型的类装载器来装载被引用的类型。这个动态连接的过程,对于虚拟机分离命名空间的方式也是至关重要的。为了能够正确地执行动态连接以及维护多个命名空间,虚拟机需要在方法表中得知每个类都是由哪个类装载器装载的。

  指向Class类的引用

  对于每一个被装载的类型(不管是类还是接口),虚拟机都会相应地为它创建一个java.lang.Class类的实例,而且虚拟机还必须以某种方式把这个实例和存储在方法区中的类型数据关联起来。

  在Java程序中,你可以得到并使用指向Class对象的引用。Class类中的一个静态方法可以让用户得到任何已装载的类的Class实例的引用。

public static Class<?> forName(String className)

  比如,如果调用forName("java.lang.Object"),那么将得到一个代表java.lang.Object的Class对象的引用。可以使用forName()来得到代表任何包中任何类型的Class对象的引用,只要这个类型可以被(或者已经被)装载到当前命名空间中。如果虚拟机无法把请求的类型装载到当前命名空间,那么会抛出ClassNotFoundException异常。

 

  另一个得到Class对象引用的方法是,可以调用任何对象引用的getClass()方法。这个方法被来自Object类本身的所有对象继承:

public final native Class<?> getClass();

  比如,如果你有一个到java.lang.Integer类的对象的引用,那么你只需简单地调用Integer对象引用的getClass()方法,就可以得到表示java.lang.Integer类的Class对象。

  方法区使用实例

  为了展示虚拟机如何使用方法区中的信息,下面来举例说明:

class Lava {private int speed = 5;void flow(){
        
    }
}
public class Volcano {    public static void main(String[] args){
        Lava lava = new Lava();
        lava.flow();
    }
}

  不同的虚拟机实现可能会用完全不同的方法来操作,下面描述的只是其中一种可能——但并不是仅有的一种。

To run the Volcano program, you must first tell the virtual machine the name "Volcano" in some "implementation-dependent" way. After that, the virtual machine will find and read the corresponding class file "Volcano.class", and then it will extract the type information from the binary data in the imported class file and put it in the method area. By executing the bytecode saved in the method area, the virtual machine starts executing the main() method. During execution, it will always hold the constant pool (a data structure in the method area) pointing to the current class (Volcano class). pointer.

Note: When the virtual machine starts executing the bytecode of the main() method in the Volcano class, although the Lava class has not been loaded, like most (perhaps all) virtual machine implementations, it will not Wait until all classes used in the program are loaded before running. On the contrary, it will only load the corresponding class when needed.

The first instruction of main() tells the virtual machine to allocate enough memory for the class listed in the first item of the constant pool. So the virtual machine uses the pointer to the Volcano constant pool to find the first item, finds that it is a symbolic reference to the Lava class, and then it checks the method area to see if the Lava class has been loaded.

This symbolic reference is simply a string giving the fully qualified name "Lava" of the class Lava. In order for the virtual machine to find a class from a name as quickly as possible, the designer of the virtual machine should choose the best data structures and algorithms.

When the virtual machine finds that the class named "Lava" has not been loaded, it starts to search and load the file "Lava.class", and puts the type information extracted from the read binary data into in the method area.

Immediately afterwards, the virtual machine replaces the first item of the constant pool (that is, the string "Lava") with a pointer directly pointing to the Lava class data in the method area. This pointer can be used to quickly access Lava in the future. Class. This replacement process is called constant pool resolution, which replaces symbol references in the constant pool with direct references.

Finally, the virtual machine is ready to allocate memory for a new Lava object. At this point it needs the information in the method area again. Remember the pointer you just put into the first item of the Volcano class constant pool? Now the virtual machine uses it to access Lava type information and find out the information recorded in it: how much heap space a Lava object needs to allocate.

The JAVA virtual machine can always determine how much memory an object requires through the type information of the storage and method areas. When the JAVA virtual machine determines the size of a Lava object, it allocates such a large space on the heap. , and initialize the variable speed of this object instance to the default initial value 0.

When the reference to the newly generated Lava object is pushed onto the stack, the first instruction of the main() method is also completed. The following instructions call the Java code (which initializes the speed variable to its correct initial value of 5) through this reference. Another instruction will use this reference to call the flow() method of the Lava object reference.

Heap

All class instances or arrays created by a Java program during runtime are placed in the same heap. There is only one heap space in a JAVA virtual machine instance, so all threads will share this heap. And because a Java program occupies a JAVA virtual machine instance, each Java program has its own heap space - they will not interfere with each other. However, multiple threads of the same Java program share the same heap space. In this case, the synchronization issue of multi-thread access to objects (heap data) must be considered.

The JAVA virtual machine has an instruction to allocate new objects in the heap, but there is no instruction to release memory, just as you cannot explicitly release an object using the Java code area. The virtual machine itself is responsible for deciding how and when to release memory occupied by objects that are no longer referenced by running programs. Usually, the virtual machine leaves this task to the garbage collector.

The internal representation of arrays

In Java, arrays are real objects. Like other objects, arrays are always stored in the heap. Likewise, arrays have a Class instance associated with their class, and all arrays with the same dimensions and type are instances of the same class, regardless of the length of the array (the length of each dimension of a multidimensional array). For example, an array containing 3 integers and an array containing 300 integers have the same class. The length of the array is only relevant to the instance data.

The name of the array class consists of two parts: each dimension is represented by a square bracket "[", and a character or string is used to represent the element type. For example, the class name of a one-dimensional array whose element type is integer is "[I", the class name of a three-dimensional array whose element type is byte is "[[[B", and the class name of a two-dimensional array whose element type is Object is "[[Ljava/ lang/Object".

Multidimensional arrays are represented as arrays of arrays. For example, a two-dimensional array of type int will be represented as a one-dimensional array, in which each element is a reference to a one-dimensional int array, as shown below:

 

Each array object in the heap must also store the length of the array, the array data, and some reference to the class data of the array. The virtual machine must be able to obtain the length of the array through a reference to an array object, access its elements through indexes (during which the array boundaries must be checked to see if they are out of bounds), call methods declared by the direct superclass Object of all arrays, and so on.

Program Counter

For a running Java program, each thread in it has its own PC (Program Counter) register, which is created when the thread starts , the size of the PC register is one word, so it can hold either a local pointer or a returnAddress. When a thread executes a Java method, the content of the PC register is always the "address" of the next instruction to be executed. The "address" here can be a local pointer, or it can be relative to the method in the method bytecode. The offset of the starting instruction. If the thread is executing a native method, the value of the PC register at this time is "undefined".

Java stack

Whenever a new thread is started, the Java virtual machine allocates a Java stack for it. The Java stack saves the running status of the thread in frames. The virtual machine will only perform two operations directly on the Java stack: pushing and popping in frames.

The method being executed by a thread is called the current method of the thread. The stack frame used by the current method is called the current frame. The class to which the current method belongs is called the current class. The constant pool of the current class is called The current constant pool. When a thread executes a method, it keeps track of the current class and the current constant pool. In addition, when the virtual machine encounters an in-stack operation instruction, it performs the operation on the data in the current frame.

Whenever a thread calls a Java method, the virtual machine pushes a new frame into the thread's Java stack. And this new frame naturally becomes the current frame. When executing this method, it uses this frame to store parameters, local variables, intermediate operation results and other data.

Java methods can be done in two ways. One is returned by return, which is called normal return; the other is terminated abnormally by throwing an exception. No matter which method is returned, the virtual machine will pop the current frame out of the Java stack and release it, so that the frame of the previous method becomes the current frame.

All data on Java frames is private to this thread. No thread can access the stack data of another thread, so we do not need to consider the synchronization of stack data access in multi-thread situations. When a thread calls a method, the method's local variables are saved in the calling thread's Java stack frame. Only one thread can always access those local variables, the thread that calls the method.

Local method stack

All the runtime data areas mentioned earlier are clearly defined in the Java virtual machine specification. In addition, for a running Java program, It may also use some data areas associated with native methods. When a thread calls a native method, it enters a new world that is no longer restricted by the virtual machine. A native method can access the virtual machine's runtime data area through the native method interface, but more than that, it can do whatever it wants.

Native methods are essentially implementation-dependent, and designers of virtual machine implementations are free to decide what mechanism to use to allow Java programs to call native methods.

Any native method interface will use some kind of native method stack. When a thread calls a Java method, the virtual machine creates a new stack frame and pushes it onto the Java stack. However, when it calls a local method, the virtual machine keeps the Java stack unchanged and no longer pushes a new frame into the thread's Java stack. The virtual machine simply dynamically connects and directly calls the specified local method.

If the local method interface implemented by a virtual machine uses the C connection model, then its local method stack is the C stack. When a C program calls a C function, its stack operations are determined. The parameters passed to the function are pushed onto the stack in a certain order, and its return value is passed back to the caller in a certain way. Again, this is how the native method stack behaves in a virtual machine implementation.

It is very likely that the local method interface needs to call back the Java method in the Java virtual machine. In this case, the thread will save the state of the local method stack and enter another Java stack.

The following figure depicts a scenario where when a thread calls a local method, the local method calls back another Java method in the virtual machine. This picture shows a panoramic view of thread running inside the JAVA virtual machine. A thread may execute Java methods and operate its Java stack throughout its life cycle; or it may jump between the Java stack and the native method stack without any obstacles.​

The thread first called two Java methods, and the second Java method called a local method, which caused the virtual machine to use a local method stack. Suppose this is a C language stack with two C functions in between. The first C function is called as a native method by the second Java method, and this C function calls the second C function. Then the second C function calls back a Java method (the third Java method) through the local method interface, and finally this Java method calls a Java method (it becomes the current method in the figure).

Attention, students learning Java! ! !

If you encounter any problems during the learning process or want to obtain learning resources, you are welcome to join the Java learning exchange group: 299541275 Let’s learn Java together!

The above is the detailed content of Detailed introduction to Java virtual machine architecture. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn