search
HomeBackend DevelopmentPython TutorialWhat is the pyc file structure of the python virtual machine?

    PYC file

    The pyc file is a bytecode file generated by Python when interpreting and executing the source code. It contains the compilation results of the source code. and related metadata information so that Python can load and execute code faster.

    Different from compiled languages, Python is an interpreted language and does not directly compile source code into machine code and execute it. Before running the code, the Python interpreter will first compile the source code into bytecode, and then interpret the bytecode for execution. The .pyc file is the bytecode file generated during this process.

    When the Python interpreter executes a .py file for the first time, it will generate a corresponding .pyc file in the same directory so that it can be executed faster the next time the file is loaded. When the source file is modified and reloaded, the interpreter regenerates the .pyc file to update the cached bytecode.

    Generate PYC file

    Normal python files need to be turned into bytecode through the compiler, and then the bytecode is handed over to the python virtual machine, and then the python virtual machine executes the bytecode. The overall process is as follows:

    What is the pyc file structure of the python virtual machine?

    #We can directly use the compile all module to generate the pyc file of the corresponding file.

    ➜  pvm ls
    demo.py  hello.py
    ➜  pvm python -m compileall .
    Listing '.'...
    Listing './.idea'...
    Listing './.idea/inspectionProfiles'...
    Compiling './demo.py'...
    Compiling './hello.py'...
    ➜  pvm ls
    __pycache__ demo.py     hello.py
    ➜  pvm ls __pycache__ 
    demo.cpython-310.pyc  hello.cpython-310.pyc

    python -m compileall . The command will recursively scan the py files in the current directory and generate the pyc file of the corresponding file.

    PYC file layout

    What is the pyc file structure of the python virtual machine?

    ##The first part The magic number consists of two parts:

    What is the pyc file structure of the python virtual machine?

    The first part The magic is composed of a 2-byte integer and two other characters, carriage return and line feed. "\r\n" also occupies two bytes, making a total of four bytes. This two-byte integer is different in different python versions. For example, in python3.5, this value is 3351, etc., and in python3.9, this value is 3420, 3421, 3422, 3423, 3424, etc. ( in a minor version of Python 3.9).

    Part 2 Bit Field The main purpose of this field is to enable reproducible compilation results in the future, but in python3.9a2, the values ​​of this field are still all 0. Please refer to PEP552 - Deterministic pyc for details. This field does not exist in early versions of python2 and python3 (not yet in python3.5). This field only appears in later versions of python3.

    The third part is the size of the entire py source file.

    The fourth part is also the most important part of the entire pyc file. The last part is the data after serialization of a CodeObject object. We will carefully analyze the data related to this object later.

    Let’s now analyze a pyc file in detail. The corresponding python code is:

    def f():
        x = 1
        return 2

    The hexadecimal form of the pyc file is as follows:

    ➜  __pycache__ hexdump -C hello.cpython-310.pyc
    00000000  6f 0d 0d 0a 00 00 00 00  b9 48 21 64 20 00 00 00  |o........H!d ...|
    00000010  e3 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000020  00 02 00 00 00 40 00 00  00 73 0c 00 00 00 64 00  |.....@...s....d.|
    00000030  64 01 84 00 5a 00 64 02  53 00 29 03 63 00 00 00  |d...Z.d.S.).c...|
    00000040  00 00 00 00 00 00 00 00  00 01 00 00 00 01 00 00  |................|
    00000050  00 43 00 00 00 73 08 00  00 00 64 01 7d 00 64 02  |.C...s....d.}.d.|
    00000060  53 00 29 03 4e e9 01 00  00 00 e9 02 00 00 00 a9  |S.).N...........|
    00000070  00 29 01 da 01 78 72 03  00 00 00 72 03 00 00 00  |.)...xr....r....|
    00000080  fa 0a 2e 2f 68 65 6c 6c  6f 2e 70 79 da 01 66 01  |.../hello.py..f.|
    00000090  00 00 00 73 04 00 00 00  04 01 04 01 72 06 00 00  |...s........r...|
    000000a0  00 4e 29 01 72 06 00 00  00 72 03 00 00 00 72 03  |.N).r....r....r.|
    000000b0  00 00 00 72 03 00 00 00  72 05 00 00 00 da 08 3c  |...r....r......<|
    000000c0  6d 6f 64 75 6c 65 3e 01  00 00 00 73 02 00 00 00  |module>....s....|
    000000d0  0c 00                                             |..|
    000000d2

    Because of data usage Little endian representation, so for the above data:

    • The first part of the magic number is: 0xa0d0d6f.

    • The second part Bit Field is: 0x0.

    • The last modification date of the third part is: 0x642148b9.

    • The file size of the fourth part is: 0x20 bytes, which means the size of the hello.py file is 32 bytes.

    The following is a small code snippet for reading the header meta information of the pyc file:

    import struct
    import time
    import binascii
    fname = "./__pycache__/hello.cpython-310.pyc"
    f = open(fname, "rb")
    magic = struct.unpack(&#39;<l&#39;, f.read(4))[0]
    bit_filed = f.read(4)
    print(f"bit field = {binascii.hexlify(bit_filed)}")
    moddate = f.read(4)
    filesz = f.read(4)
    modtime = time.asctime(time.localtime(struct.unpack(&#39;<l&#39;, moddate)[0]))
    filesz = struct.unpack(&#39;<L&#39;, filesz)
    print("magic %s" % (hex(magic)))
    print("moddate (%s)" % (modtime))
    print("File Size %d" % filesz)
    f.close()

    The output of the above code is as follows:

    bit field = b'00000000'

    magic 0xa0d0d6f
    moddate (Mon Mar 27 15:41:45 2023)
    File Size 32

    About pyc For detailed file operations, please view the source code of the python standard library importlib/_bootstrap_external.py file.

    CODEOBJECT

    In CPython,

    CodeObject is an object, which contains the bytecode, constants, variables, positional parameters, keyword parameters and other information of the Python code , and some metadata used to run the code, such as file name, code line number, etc.

    In CPython, when we execute a Python module or function, the interpreter will first compile its code into

    CodeObject and then execute it. During compilation, the interpreter converts Python code into bytecode and saves it in a CodeObject object. Thereafter, whenever we call that module or function, the interpreter will use the bytecode in CodeObject to execute the code.

    CodeObject Objects are immutable and cannot be modified once created. This is because the bytecode of Python code is immutable, and the CodeObject object contains these bytecodes, so it is also immutable.

    This article mainly introduces the main contents of the code object and briefly introduces their functions. In subsequent articles, the source code corresponding to the code object and the detailed functions of the corresponding fields will be carefully analyzed.

    现在举一个例子来分析一下 pycdemo.py 的 pyc 文件,pycdemo.py 的源程序如下所示:

    if __name__ == &#39;__main__&#39;:
        a = 100
        print(a)

    下面的代码是一个用于加载 pycdemo01.cpython-39.pyc 文件(也就是 hello.py 对应的 pyc 文件)的代码,使用 marshal 读取 pyc 文件里面的 code object 。

    import marshal
    import dis
    import struct
    import time
    import types
    import binascii
    def print_metadata(fp):
        magic = struct.unpack(&#39;<l&#39;, fp.read(4))[0]
        print(f"magic number = {hex(magic)}")
        bit_field = struct.unpack(&#39;<l&#39;, fp.read(4))[0]
        print(f"bit filed = {bit_field}")
        t = struct.unpack(&#39;<l&#39;, fp.read(4))[0]
        print(f"time = {time.asctime(time.localtime(t))}")
        file_size = struct.unpack(&#39;<l&#39;, fp.read(4))[0]
        print(f"file size = {file_size}")
    def show_code(code, indent=&#39;&#39;):
        print ("%scode" % indent)
        indent += &#39;   &#39;
        print ("%sargcount %d" % (indent, code.co_argcount))
        print ("%snlocals %d" % (indent, code.co_nlocals))
        print ("%sstacksize %d" % (indent, code.co_stacksize))
        print ("%sflags %04x" % (indent, code.co_flags))
        show_hex("code", code.co_code, indent=indent)
        dis.disassemble(code)
        print ("%sconsts" % indent)
        for const in code.co_consts:
            if type(const) == types.CodeType:
                show_code(const, indent+&#39;   &#39;)
            else:
                print("   %s%r" % (indent, const))
        print("%snames %r" % (indent, code.co_names))
        print("%svarnames %r" % (indent, code.co_varnames))
        print("%sfreevars %r" % (indent, code.co_freevars))
        print("%scellvars %r" % (indent, code.co_cellvars))
        print("%sfilename %r" % (indent, code.co_filename))
        print("%sname %r" % (indent, code.co_name))
        print("%sfirstlineno %d" % (indent, code.co_firstlineno))
        show_hex("lnotab", code.co_lnotab, indent=indent)
    def show_hex(label, h, indent):
        h = binascii.hexlify(h)
        if len(h) < 60:
            print("%s%s %s" % (indent, label, h))
        else:
            print("%s%s" % (indent, label))
            for i in range(0, len(h), 60):
                print("%s   %s" % (indent, h[i:i+60]))
    if __name__ == &#39;__main__&#39;:
        filename = "./__pycache__/pycdemo01.cpython-39.pyc"
        with open(filename, "rb") as fp:
            print_metadata(fp)
            code_object = marshal.load(fp)
            show_code(code_object)

    执行上面的程序输出结果如下所示:

    magic number = 0xa0d0d61
    bit filed = 0
    time = Tue Mar 28 02:40:20 2023
    file size = 54
    code
       argcount 0
       nlocals 0
       stacksize 2
       flags 0040
       code b&#39;650064006b02721464015a01650265018301010064025300&#39;
      3           0 LOAD_NAME                0 (__name__)
                  2 LOAD_CONST               0 (&#39;__main__&#39;)
                  4 COMPARE_OP               2 (==)
                  6 POP_JUMP_IF_FALSE       20
      4           8 LOAD_CONST               1 (100)
                 10 STORE_NAME               1 (a)
      5          12 LOAD_NAME                2 (print)
                 14 LOAD_NAME                1 (a)
                 16 CALL_FUNCTION            1
                 18 POP_TOP
            >>   20 LOAD_CONST               2 (None)
                 22 RETURN_VALUE
       consts
          &#39;__main__&#39;
          100
          None
       names (&#39;__name__&#39;, &#39;a&#39;, &#39;print&#39;)
       varnames ()
       freevars ()
       cellvars ()
       filename &#39;./pycdemo01.py&#39;
       name &#39;<module>&#39;
       firstlineno 3
       lnotab b&#39;08010401&#39;

    下面是 code object 当中各个字段的作用:

    • 首先需要了解一下代码块这个概念,所谓代码块就是一个小的 python 代码,被当做一个小的单元整体执行。在 Python 中常见的代码块包括函数体、类的定义和模块。

    • argcount,这个表示一个代码块的参数个数,这个参数只对函数体代码块有用,因为函数可能会有参数,比如上面的 pycdemo.py 是一个模块而不是一个函数,因此这个参数对应的值为 0 。

    • co_code,这个对象的具体内容就是一个字节序列,存储真实的 python 字节码,主要是用于 python 虚拟机执行的,在本篇文章当中暂时不详细分析。

    • co_consts,这个字段是一个列表类型的字段,主要是包含一些字符串常量和数值常量,比如上面的 ";main" 和 100 。

    • co_filename,这个字段的含义就是对应的源文件的文件名。

    • co_firstlineno,这个字段的含义为在 python 源文件当中第一行代码出现的行数,这个字段在进行调试的时候非常重要。

    • 主要含义是标识该 code object 的类型的字段是 co_flags。0x0080 表示这个 block 是一个协程,0x0010 表示这个 code object 是嵌套的等等。

    • co_lnotab,这个字段的含义主要是用于计算每个字节码指令对应的源代码行数。

    • The main purpose of the field "co_varnames" is to indicate a name defined locally in a code object.。

    • co_names,和 co_varnames 相反,表示非本地定义但是在 code object 当中使用的名字。

    • co_nlocals,这个字段表示在一个 code object 当中本地使用的变量个数。

    • co_stackszie,因为 python 虚拟机是一个栈式计算机,这个参数的值表示这个栈需要的最大的值。

    • co_cellvars,co_freevars,这两个字段主要和嵌套函数和函数闭包有关。

    The above is the detailed content of What is the pyc file structure of the python virtual machine?. For more information, please follow other related articles on the PHP Chinese website!

    Statement
    This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete
    详细讲解Python之Seaborn(数据可视化)详细讲解Python之Seaborn(数据可视化)Apr 21, 2022 pm 06:08 PM

    本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

    详细了解Python进程池与进程锁详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

    本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

    Python自动化实践之筛选简历Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

    本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

    归纳总结Python标准库归纳总结Python标准库May 03, 2022 am 09:00 AM

    本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于标准库总结的相关问题,下面一起来看一下,希望对大家有帮助。

    分享10款高效的VSCode插件,总有一款能够惊艳到你!!分享10款高效的VSCode插件,总有一款能够惊艳到你!!Mar 09, 2021 am 10:15 AM

    VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

    Python数据类型详解之字符串、数字Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

    本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

    python中文是什么意思python中文是什么意思Jun 24, 2019 pm 02:22 PM

    pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。

    详细介绍python的numpy模块详细介绍python的numpy模块May 19, 2022 am 11:43 AM

    本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

    See all articles

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    AI Hentai Generator

    AI Hentai Generator

    Generate AI Hentai for free.

    Hot Article

    R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
    2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
    Repo: How To Revive Teammates
    1 months agoBy尊渡假赌尊渡假赌尊渡假赌
    Hello Kitty Island Adventure: How To Get Giant Seeds
    4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

    Hot Tools

    SublimeText3 Chinese version

    SublimeText3 Chinese version

    Chinese version, very easy to use

    mPDF

    mPDF

    mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

    SublimeText3 Linux new version

    SublimeText3 Linux new version

    SublimeText3 Linux latest version

    MantisBT

    MantisBT

    Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

    SAP NetWeaver Server Adapter for Eclipse

    SAP NetWeaver Server Adapter for Eclipse

    Integrate Eclipse with SAP NetWeaver application server.