Home >Java >javaTutorial >A detailed introduction to JVM bytecode in Java
This article brings you a detailed introduction to JVM bytecode in Java. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
This is an article about Java Basics (JVM). I originally wanted to talk about the Java class loading mechanism first, but then I thought about it. The role of the JVM is to load the bytecode compiled by the compiler and interpret it into the machine. code, then you should first understand the bytecode, and then talk about the class loading mechanism for loading the bytecode. It seems better, so this article is changed to a detailed explanation of the bytecode.
Due to the purely object-oriented nature of Java, as long as the bytecode can represent the information of a class, it can represent the entire Java program. As long as the JVM can load the information of a class, it can load the entire program. Therefore, whether it is bytecode or JVM loading mechanism, the focus is on classes. My main concerns are:
1. Since the bytecode is not loaded into memory all at once, how does the JVM know where the class information it wants to load is located in the .class file?
2. How does bytecode represent class information?
3. Will the bytecode optimize the program?
The first question is very simple, because even if a source file has many classes (only one public class), the compiler will generate a .class file for each class, and the JVM will load it as needed Just load the loaded class name.
To solve the following problems, first let's look at the composition of bytecode (open with Hex Fiend on Mac).
For this piece of code:
package com.test.main1; public class ByteCodeTest { int num1 = 1; int num2 = 2; public int getAdd() { return num1 + num2; } } class Extend extends ByteCodeTest { public int getSubstract() { return num1 - num2; } }
Let’s analyze the Extend class in it.
Use Hex Fiend to open the compiled .class file like this (hexadecimal code):
Since the class file has no delimiter, each What each position represents, the length of each part and other formats are strictly regulated, see the table below:
among them u1, u2, u4, u8 Represents an unsigned number of several bytes. In the decompiled hexadecimal file, two numbers represent one byte, which is u1.
Look at it one by one from beginning to end:
(1) magic: u4, the magic number, means that this file is a .class file. .jpg, etc. will also have this magic number. Because of the magic number, even if *.jpg is changed to *.123, it can still be opened as usual.
(2) minor version, major version: each u2, version number, backward compatible, that is, higher version JDK can use lower version .class files, but not vice versa.
(3) constant_pool_count: u2, the number of constants in the constant pool, 0019 represents 24.
(4) Next are the specific constants, a total of constant_pool_count-1.
Constant pool usually stores two types of data:
Literals: such as strings, final modified constants, etc.;
Symbol references: such as the full name of a class/interface Qualified names, method names and descriptions, field names and descriptions, etc.
According to the decompiled numbers, first check the table below to get the type and length of the constant. The next number that is equal to the length is the specific value of the constant.
For example, 070002, it means that the type is CONSTANT_Class_info, its tag is u1, and the length of u2 is the index pointing to the fully qualified name constant item. This index should also be viewed together with the class file opened by javap -verbose. The contents and order in the constant pool are clearly listed here:
You can see 0002 here The constant of the index item is: com/test/main1/Extend, which is the fully qualified name of the class. If the value is a string, you need to convert the value into decimal and check the ASCII code table to get the specific characters. The following constants are analyzed as follows:
01001563 6F6D2F74 6573742F 6D61696E 312F4578 74656E64:com/test/main1/Extend
070004:com/test/main1/ByteCodeTest
01001B63 6F6D2F74 6573742F 6D61696E 312F4279 7465436F 64655465 7374:com/test/main1/ByteCodeTest
0100063C 696E6974 3E:7e51f00a783d7eb8f68358439dee7daf
01000328 2956: ()V
01000443 6F6465 :Code
0A000300 09:com/test/main1/ByteCodeTest、"7e51f00a783d7eb8f68358439dee7daf":()V
0C000500 06:7e51f00a783d7eb8f68358439dee7daf、()V
01000F4C 696E654E 756D6265 72546162 6C65:LineNumberTable
0100124C 6F63616C 56617269 61626C65 5461626C 65:LocalVariableTable
01000474 6 86973:this
0100174C 636F6D2F 74657374 2F6D6169 6E312F45 7874656E 643B:Lcom/test /main1/Extend;
01000C67 65745375 62737472 616374:getSubstract
01000328 2949: ()I
09000100 11:com/test/main1/Extend、num1:I
0C001200 13: num1, I
0100046E 756D31: num1
01000149: I
09000100 15: com/test/main1/Extend, num2:I
0C001600 13: num2, I
0100046E 756D32: num2
01000A53 6F757263 6546696C 65: SourceFile
01001142 79746543 6F646554 6573742E 6A617661: ByteCodeTest.java
At this point, all the constants in the constant pool have been parsed.
(5) Next is the access_flags of u2: The main purpose of the access_flags access flag is to mark whether the class is a class or an interface. If it is a class, whether the access permission is public, whether it is abstract, and whether it is marked as final, etc., see the table below:
Flag_name | ##Value | Interpretation |
0x0001 | indicates that the access permission is public and can be accessed from outside this package | |
0x0010 | means it is modified by final and no subclasses are allowed | |
0x0020 | is special, indicating dynamic binding to the direct parent class. See the explanation below | |
0x0200 | represents an interface, not a class | |
0x0400 | represents an abstract class and cannot be instantiated | ##ACC_SYNTHETIC |
0x1000 | means it is modified by synthetic and does not appear in the source code. See appendix [2] | ACC_ANNOTATION |
##0x2000 | indicates annotation type | ACC_ENUM |
0x4000 | represents an enumeration type | 所以,本类中的access_flags是0020,表示这个Extend类调用父类的方法时,并非是编译时绑定,而是在运行时搜索类层次,找到最近的父类进行调用。这样可以保证调用的结果是一定是调用最近的父类,而不是编译时绑定的父类,保证结果的正确性。 (6)this_class:u2的类索引,用于确定类的全限定名。本类的this_class是0001,表示在常量池中#1索引,是com/test/main1/Extend (7)super_class:u2的父类索引,用于确定直接父类的全限定名。本类是0003,#3是com/test/main1/ByteCodeTest (8)interfaces_count:u2,表示当前类实现的接口数量,注意是直接实现的接口数量。本类中是0000,表示没有实现接口。 (9)Interfaces:表示接口的全限定名索引。每个接口u2,共interfaces_count个。本类为空。 (10)fields_count:u2,表示类变量和实例变量总的个数。本类中是0000,无。 (11)fields:fileds的长度为filed_info,filed_info是一个复合结构,组成如下: filed_info: { u2 access_flags; u2 name_index; u2 descriptor_index; u2 attributes_count; attribute_info attributes[attributes_count]; } 由于本类无类变量和实例变量,故本字段为空。 (12)methods_count:u2,表示方法个数。本类中是0002,表示有2个。 (13)methods:methods的长度为一个method_info结构: method_info { u2 access_flags; 0000 ? u2 name_index; 0005 <init> u2 descriptor_index; 0006 ()V u2 attributes_count; 0001 1个 attribute_info attributes[attributes_count]; 0007 Code } 其中attribute_info结构如下: attribute_info { u2 attribute_name_index; 0007 Code u1 attribute_length; u1 info[attribute_length]; } 上面是通用的attribute_info的定义,另外,JVM里预定义了几种attribute,Code即是其中一种(注意,如果使用的是JVM预定义的attribute,则attribute_info的结构就按照预定义的来),其结构如下: Code_attribute { //Code_attribute包含某个方法、实例初始化方法、类或接口初始化方法的Java虚拟机指令及相关辅助信息 u2 attribute_name_index; 0007 Code u4 attribute_length; 0000002F 47 u2 max_stack; 0001 1 //用来给出当前方法的操作数栈在方法执行的任何时间点的最大深度 u2 max_locals; 0001 1 //用来给出分配在当前方法引用的局部变量表中的局部变量个数 u4 code_length; 00000005 5 //给出当前方法code[]数组的字节数 u1 code[code_length]; 2AB70008 B1 42、183、0、8、177 //给出了实现当前方法的Java虚拟机代码的实际字节内容 (这些数字代码实际对应一些Java虚拟机的指令) u2 exception_table_lentgh; 0000 0 //异常的信息 { u2 start_pc; //这两项的值表明了异常处理器在code[]中的有效范围,即异常处理器x应满足:start_pc≤x≤end_pc u2 end_pc; //start_pc必须在code[]中取值,end_pc要么在code[]中取值,要么等于code_length的值 u2 handler_pc; //表示一个异常处理器的起点 u2 catch_type; //表示当前异常处理器需要捕捉的异常类型。为0,则都调用该异常处理器,可用来实现finally。 } exception_table[exception_table_lentgh]; 在本类中大括号里的结构为空 u2 attribute_count; 0002 2 表示该方法的其它附加属性,本类有1个 attribute_info attributes[attributes_count]; 000A、000B LineNumberTable、LocalVariableTable } LineNumberTable和LocalVariableTable又是两个预定义的attribute,其结构如下: LineNumberTable_attribute { //被调试器用来确定源文件中由给定的行号所表示的内容,对应于Java虚拟机code[]数组的哪部分 u2 attribute_name_index; 000A u4 attribute_length; 00000006 u2 line_number_table_length; 0001 { u2 start_pc; 0000 u2 line_number; 000E //该值必须与源文件中对应的行号相匹配 } line_number_table[line_number_table_length]; } 以及: LocalVariableTable_attribute { u2 attribute_name_index; 000B u4 attribute_length; 0000000C u2 local_variable_table_length; 0001 { u2 start_pc; 0000 u2 length; 0005 u2 name_index; 000C u2 descriptor_index; 000D //用来表示源程序中局部变量类型的字段描述符 u2 index; 0000 } local_variable_table[local_variable_table_length]; 然后就是第二个方法,具体略过。 (14)attributes_count:u2,这里的attribute表示整个class文件的附加属性,和前面方法的attribute结构相同。本类中为0001。 (15)attributes:class文件附加属性,本类中为0017,指向常量池#17,为SourceFile,SourceFile的结构如下: SourceFile_attribute { u2 attribute_name_index; 0017 SourceFile u4 attribute_length; 00000002 2 u2 sourcefile_index; 0018 ByteCodeTest.java //表示本class文件是由ByteCodeTest.java编译来的 } 嗯,字节码的内容大概就写这么多。可以看到通篇文章基本都是在分析字节码文件的16进制代码,所以可以这么说,字节码的核心在于其16进制代码,利用规范中的规则去解析这些代码,可以得出关于这个类的全部信息,包括: 1. 这个类的版本号; 2. 这个类的常量池大小,以及常量池中的常量; 3. 这个类的访问权限; 4. 这个类的全限定名、直接父类全限定名、类的直接实现的接口信息; 5. 这个类的类变量和实例变量的信息; 6. 这个类的方法信息; 7. 其它的这个类的附加信息,如来自哪个源文件等。 解析完字节码,回头再来看开始提出的问题,也就迎刃而解了。由于字节码文件格式严格按照规定,可以用来表示类的全部信息;字节码只是用来表示类信息的,不会进行程序的优化。 那么在编译期间,编译器会对程序进行优化吗?运行期间JVM会吗?什么时候进行的,按照什么原则呢?这个留作以后再表。 最后,值得注意的是,字节码不仅是平台无关的(任何平台生成的字节码都可以在任何的JRE环境运行),还是语言无关的,不仅Java可以生成字节码,其它语言如Groovy、Jython、Scala等也能生成字节码,运行在JRE环境中。 |
The above is the detailed content of A detailed introduction to JVM bytecode in Java. For more information, please follow other related articles on the PHP Chinese website!