Home > Article > Backend Development > What is the pyc file structure of the python virtual machine?
The pyc file is a bytecode file generated by Python when interpreting and executing the source code. It contains the compilation results of the source code. and related metadata information so that Python can load and execute code faster.
Different from compiled languages, Python is an interpreted language and does not directly compile source code into machine code and execute it. Before running the code, the Python interpreter will first compile the source code into bytecode, and then interpret the bytecode for execution. The .pyc file is the bytecode file generated during this process.
When the Python interpreter executes a .py file for the first time, it will generate a corresponding .pyc file in the same directory so that it can be executed faster the next time the file is loaded. When the source file is modified and reloaded, the interpreter regenerates the .pyc file to update the cached bytecode.
Normal python files need to be turned into bytecode through the compiler, and then the bytecode is handed over to the python virtual machine, and then the python virtual machine executes the bytecode. The overall process is as follows:
#We can directly use the compile all module to generate the pyc file of the corresponding file.
➜ pvm ls demo.py hello.py ➜ pvm python -m compileall . Listing '.'... Listing './.idea'... Listing './.idea/inspectionProfiles'... Compiling './demo.py'... Compiling './hello.py'... ➜ pvm ls __pycache__ demo.py hello.py ➜ pvm ls __pycache__ demo.cpython-310.pyc hello.cpython-310.pyc
python -m compileall .
The command will recursively scan the py files in the current directory and generate the pyc file of the corresponding file.
def f(): x = 1 return 2The hexadecimal form of the pyc file is as follows:
➜ __pycache__ hexdump -C hello.cpython-310.pyc 00000000 6f 0d 0d 0a 00 00 00 00 b9 48 21 64 20 00 00 00 |o........H!d ...| 00000010 e3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 00 02 00 00 00 40 00 00 00 73 0c 00 00 00 64 00 |.....@...s....d.| 00000030 64 01 84 00 5a 00 64 02 53 00 29 03 63 00 00 00 |d...Z.d.S.).c...| 00000040 00 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 |................| 00000050 00 43 00 00 00 73 08 00 00 00 64 01 7d 00 64 02 |.C...s....d.}.d.| 00000060 53 00 29 03 4e e9 01 00 00 00 e9 02 00 00 00 a9 |S.).N...........| 00000070 00 29 01 da 01 78 72 03 00 00 00 72 03 00 00 00 |.)...xr....r....| 00000080 fa 0a 2e 2f 68 65 6c 6c 6f 2e 70 79 da 01 66 01 |.../hello.py..f.| 00000090 00 00 00 73 04 00 00 00 04 01 04 01 72 06 00 00 |...s........r...| 000000a0 00 4e 29 01 72 06 00 00 00 72 03 00 00 00 72 03 |.N).r....r....r.| 000000b0 00 00 00 72 03 00 00 00 72 05 00 00 00 da 08 3c |...r....r......<| 000000c0 6d 6f 64 75 6c 65 3e 01 00 00 00 73 02 00 00 00 |module>....s....| 000000d0 0c 00 |..| 000000d2Because of data usage Little endian representation, so for the above data:
import struct import time import binascii fname = "./__pycache__/hello.cpython-310.pyc" f = open(fname, "rb") magic = struct.unpack('<l', f.read(4))[0] bit_filed = f.read(4) print(f"bit field = {binascii.hexlify(bit_filed)}") moddate = f.read(4) filesz = f.read(4) modtime = time.asctime(time.localtime(struct.unpack('<l', moddate)[0])) filesz = struct.unpack('<L', filesz) print("magic %s" % (hex(magic))) print("moddate (%s)" % (modtime)) print("File Size %d" % filesz) f.close()The output of the above code is as follows:
bit field = b'00000000'About pyc For detailed file operations, please view the source code of the python standard library importlib/_bootstrap_external.py file. CODEOBJECTIn CPython,magic 0xa0d0d6f
moddate (Mon Mar 27 15:41:45 2023)
File Size 32
CodeObject is an object, which contains the bytecode, constants, variables, positional parameters, keyword parameters and other information of the Python code , and some metadata used to run the code, such as file name, code line number, etc.
CodeObject and then execute it. During compilation, the interpreter converts Python code into bytecode and saves it in a
CodeObject object. Thereafter, whenever we call that module or function, the interpreter will use the bytecode in
CodeObject to execute the code.
CodeObject Objects are immutable and cannot be modified once created. This is because the bytecode of Python code is immutable, and the
CodeObject object contains these bytecodes, so it is also immutable.
现在举一个例子来分析一下 pycdemo.py 的 pyc 文件,pycdemo.py 的源程序如下所示:
if __name__ == '__main__': a = 100 print(a)
下面的代码是一个用于加载 pycdemo01.cpython-39.pyc 文件(也就是 hello.py 对应的 pyc 文件)的代码,使用 marshal 读取 pyc 文件里面的 code object 。
import marshal import dis import struct import time import types import binascii def print_metadata(fp): magic = struct.unpack('<l', fp.read(4))[0] print(f"magic number = {hex(magic)}") bit_field = struct.unpack('<l', fp.read(4))[0] print(f"bit filed = {bit_field}") t = struct.unpack('<l', fp.read(4))[0] print(f"time = {time.asctime(time.localtime(t))}") file_size = struct.unpack('<l', fp.read(4))[0] print(f"file size = {file_size}") def show_code(code, indent=''): print ("%scode" % indent) indent += ' ' print ("%sargcount %d" % (indent, code.co_argcount)) print ("%snlocals %d" % (indent, code.co_nlocals)) print ("%sstacksize %d" % (indent, code.co_stacksize)) print ("%sflags %04x" % (indent, code.co_flags)) show_hex("code", code.co_code, indent=indent) dis.disassemble(code) print ("%sconsts" % indent) for const in code.co_consts: if type(const) == types.CodeType: show_code(const, indent+' ') else: print(" %s%r" % (indent, const)) print("%snames %r" % (indent, code.co_names)) print("%svarnames %r" % (indent, code.co_varnames)) print("%sfreevars %r" % (indent, code.co_freevars)) print("%scellvars %r" % (indent, code.co_cellvars)) print("%sfilename %r" % (indent, code.co_filename)) print("%sname %r" % (indent, code.co_name)) print("%sfirstlineno %d" % (indent, code.co_firstlineno)) show_hex("lnotab", code.co_lnotab, indent=indent) def show_hex(label, h, indent): h = binascii.hexlify(h) if len(h) < 60: print("%s%s %s" % (indent, label, h)) else: print("%s%s" % (indent, label)) for i in range(0, len(h), 60): print("%s %s" % (indent, h[i:i+60])) if __name__ == '__main__': filename = "./__pycache__/pycdemo01.cpython-39.pyc" with open(filename, "rb") as fp: print_metadata(fp) code_object = marshal.load(fp) show_code(code_object)
执行上面的程序输出结果如下所示:
magic number = 0xa0d0d61 bit filed = 0 time = Tue Mar 28 02:40:20 2023 file size = 54 code argcount 0 nlocals 0 stacksize 2 flags 0040 code b'650064006b02721464015a01650265018301010064025300' 3 0 LOAD_NAME 0 (__name__) 2 LOAD_CONST 0 ('__main__') 4 COMPARE_OP 2 (==) 6 POP_JUMP_IF_FALSE 20 4 8 LOAD_CONST 1 (100) 10 STORE_NAME 1 (a) 5 12 LOAD_NAME 2 (print) 14 LOAD_NAME 1 (a) 16 CALL_FUNCTION 1 18 POP_TOP >> 20 LOAD_CONST 2 (None) 22 RETURN_VALUE consts '__main__' 100 None names ('__name__', 'a', 'print') varnames () freevars () cellvars () filename './pycdemo01.py' name '<module>' firstlineno 3 lnotab b'08010401'
下面是 code object 当中各个字段的作用:
首先需要了解一下代码块这个概念,所谓代码块就是一个小的 python 代码,被当做一个小的单元整体执行。在 Python 中常见的代码块包括函数体、类的定义和模块。
argcount,这个表示一个代码块的参数个数,这个参数只对函数体代码块有用,因为函数可能会有参数,比如上面的 pycdemo.py 是一个模块而不是一个函数,因此这个参数对应的值为 0 。
co_code,这个对象的具体内容就是一个字节序列,存储真实的 python 字节码,主要是用于 python 虚拟机执行的,在本篇文章当中暂时不详细分析。
co_consts,这个字段是一个列表类型的字段,主要是包含一些字符串常量和数值常量,比如上面的 ";main" 和 100 。
co_filename,这个字段的含义就是对应的源文件的文件名。
co_firstlineno,这个字段的含义为在 python 源文件当中第一行代码出现的行数,这个字段在进行调试的时候非常重要。
主要含义是标识该 code object 的类型的字段是 co_flags。0x0080 表示这个 block 是一个协程,0x0010 表示这个 code object 是嵌套的等等。
co_lnotab,这个字段的含义主要是用于计算每个字节码指令对应的源代码行数。
The main purpose of the field "co_varnames" is to indicate a name defined locally in a code object.。
co_names,和 co_varnames 相反,表示非本地定义但是在 code object 当中使用的名字。
co_nlocals,这个字段表示在一个 code object 当中本地使用的变量个数。
co_stackszie,因为 python 虚拟机是一个栈式计算机,这个参数的值表示这个栈需要的最大的值。
co_cellvars,co_freevars,这两个字段主要和嵌套函数和函数闭包有关。
The above is the detailed content of What is the pyc file structure of the python virtual machine?. For more information, please follow other related articles on the PHP Chinese website!