1. Preface
Friends who are learning Java should all know that Java has been under the banner of platform independence since the beginning, saying "write once, run anywhere". In fact, when it comes to independence, Java The platform also has another irrelevance, which is language independence. To achieve language independence, the file structure or bytecode of the class in the Java system is very important. In fact, Java has two types from the beginning. A set of specifications, one is the Java language specification, and the other is the Java virtual machine specification . The Java language specification only stipulates the constraints and rules related to the Java language, while the virtual machine specification is the real cross-border Designed from a platform perspective.
Perhaps most programmers think that it is natural and natural for the Java virtual machine to execute Java programs, but today, commercial organizations and open source organizations have developed a large number of Java programs in addition to the Java language. Languages running on virtual machines, such as Clojure, Groovy, JRuby, Jython, Scale, etc. You can use a Java compiler to compile Java code into a Class file that stores bytecode. You can also use a compiler in other languages such as JRuby to compile program code into a Class file. The reason why Java can run across platforms is because the Java virtual machine can Load and execute the same platform-independent bytecode. In other words, the basis for achieving language platform independence is the virtual machine and bytecode storage format. The virtual machine does not care what language the Class comes from. As long as it conforms to the structure of the Class file, it can run in the Java virtual machine. .
2. Class file encoding and composition
1) Class file is
composed of byte (8bit)-based byte stream, these byte streams are arranged strictly in the specified order, and there are no gaps between bytes. Data exceeding 8 bytes will be stored in Big-Endian order , that is to say, the high-order bytes are stored at low addresses, and the low-order bytes are stored at high addresses. In fact, this is also the key to cross-platform class files, because the PowerPC architecture uses Big- Endian storage order, while x86 series processors use Little-Endian storage order. Therefore, in order for Class files to maintain a unified storage order under various processor architectures, virtual machine specifications must unify them. 2) The Class file structure uses a structure similar to C language to store data. There are two main types of data items, unsigned numbers and tables. Unsigned numbers are used to express numbers, index references and strings, etc. , for example,
u1, u2, u4, u8 respectively represent 1 byte, 2 bytes, 4 bytes, and 8 bytes of unsigned numbers
, and the table has multiple A composite structure composed of unsigned numbers and other tables. Maybe everyone here is not very clear about what unsigned numbers and tables are, but it doesn't matter. I will explain it with examples when I give the examples below. After clarifying the above two points, let's take a look at the specific data contained in the byte stream arranged in strict order in the Class file:
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
When looking at the picture above, there is one thing we need to pay attention to. For example, cp_info represents the constant pool. In the picture above, constant_pool[constant_pool_count-1] is used to indicate that the constant pool has constant_pool_count-1 constants. It uses an array here. expression, but don’t mistakenly think that the constant lengths of all constant pools are the same. In fact, this place uses arrays just for convenience of description, but it is not like a programming language. An array of int type, each int has the same length. 3. Detailed introduction to each part of the Class file structure
1) u4 magic represents the magic number, and the magic number occupies 4 bytes. What does the magic number do? It actually means that the file type is a Class file, not a JPG picture or AVI movie. The magic number corresponding to the Class file is 0xCAFEBABE.
2) u2 minor_version represents the minor version number of the Class file, and this version number is an unsigned number representation of the u2 type.
3) u2 major_version represents the major version number of the Class file, and the major version number is an unsigned number representation of the u2 type.
major_version and minor_version are mainly used to indicate whether the current virtual machine accepts the current version of the Class file
. The versions of Class files compiled by different versions of Java compilers are different. Higher version virtual machines support the Class file structure compiled by lower version compilers. For example, the virtual machine corresponding to Java SE 6.0 supports the Class file structure compiled by the compiler of Java SE 5.0, and vice versa does not work. 4) u2 constant_pool_count represents the number of constant pools. Here we need to focus on what the constant pool is. Please do not confuse it with the runtime constant pool in the Jvm memory model. The constant pool in the Class file mainly stores literals and symbol references
, among which Literals mainly include strings, the value of a final constant or the initial value of a property, etc., while symbolic references mainly store the fully qualified names of classes and interfaces, the names and descriptors of fields, the names and descriptors of methods, here the name It may be easy for everyone to understand. As for the concept of descriptors, we will talk about the field table and method table below. In addition, everyone knows that the memory model of Jvm consists of heap, stack, method area, and program counter, and there is an area in the method area called the runtime constant pool. The things stored in the runtime constant pool are actually compiled Various literals and symbol references generated by the processor, but the runtime constant pool is dynamic. It can add other constants to it at runtime. The most representative one is the intern of String. method. 5) cp_info represents the constant pool, which contains the various literals and symbol references mentioned above. There are a total of 14 constants in the data items placed in the constant pool. Each constant is a table, and each constant uses a common partial tag to indicate what type of constant it is.
##Constant TypeValue |
|
CONSTANT_Class
7 |
| ##CONSTANT_Fieldref
9
|
| CONSTANT_Methodref
10
|
| CONSTANT_InterfaceMethodref
11
|
| CONSTANT_String
8
|
| CONSTANT_Integer
3
|
##CONSTANT_Float |
4
| ##CONSTANT_Long |
5
##CONSTANT_Double |
| 6
##CONSTANT_NameAndType |
12 |
CONSTANT_Utf8 |
1 |
##CONSTANT_MethodHandle
| 15 |
CONSTANT_MethodType
| 16 |
##CONSTANT_InvokeDynamic
18 |
|
6)u2 access_flags 表示类或者接口的访问信息,具体如下图所示:
Flag Name |
Value |
Interpretation |
ACC_PUBLIC |
0x0001 |
Declared public ; may be accessed from outside its package. |
ACC_FINAL |
0x0010 |
Declared final ; no subclasses allowed. |
ACC_SUPER |
0x0020 |
Treat superclass methods specially when invoked by the invokespecial instruction.
|
ACC_INTERFACE |
0x0200 |
Is an interface, not a class. |
ACC_ABSTRACT |
0x0400 |
Declared abstract ; must not be instantiated. |
ACC_SYNTHETIC |
0x1000 |
Declared synthetic; not present in the source code. |
ACC_ANNOTATION |
0x2000 |
Declared as an annotation type. |
ACC_ENUM |
0x4000 |
Declared as an enum type. |
7)u2 this_class 表示类的常量池索引,指向常量池中CONSTANT_Class的常量 8)u2 super_class 表示超类的索引,指向常量池中CONSTANT_Class的常量 9)u2 interface_counts 表示接口的数量 10)u2 interface[interface_counts]表示接口表,它里面每一项都指向常量池中CONSTANT_Class常量 11)u2 fields_count 表示类的实例变量和类变量的数量 12) field_info fields[fields_count]表示字段表的信息,其中字段表的结构如下图所示:
field_info {
u2 access_flags;
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}
上图中access_flags表示字段的访问表示,比如字段是public、private、protect 等,name_index表示字段名称,指向常量池中类型是CONSTANT_UTF8的常量,descriptor_index表示字段的描述符,它也指向常量池中类型为 CONSTANT_UTF8的常量,attributes_count表示字段表中的属性表的数量,而属性表是则是一种用与描述字段,方法以及 类的属性的可扩展的结构,不同版本的Java虚拟机所支持的属性表的数量是不同的。 13) u2 methods_count表示方法表的数量 14)method_info 表示方法表,方法表的具体结构如下图所示:
method_info {
u2 access_flags;
u2 name_index;,
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}
其中access_flags表示方法的访问表示,name_index表示名称的索引,descriptor_index表示方法的描述符,attributes_count以及attribute_info类似字段表中的属性表,只不过字段表和方法表中属性表中的属性是不同的,比如方法表中就有Code属性,表示方法的代码,而字段表中就没有Code属性。 15) attribute_count表示属性表的数量,说到属性表,我们需要明确以下几点: 属性表存在于Class文件结构的最后,字段表,方法表以及Code属性中,也就是说属性表中也可以存在属性表,属性表的长度是不固定的,不同的属性,属性表的长度是不同的
|
The above is the detailed content of Introduction to the class file structure of java virtual machine. For more information, please follow other related articles on the PHP Chinese website!