Home  >  Article  >  Backend Development  >  Detailed introduction to Python's use of struct to process binary (pack and unpack usage)

Detailed introduction to Python's use of struct to process binary (pack and unpack usage)

高洛峰
高洛峰Original
2017-03-19 14:49:492114browse

Sometimes you need to use python to process binary data, for example, when accessing files and socket operations. At this time, you can use python's struct module to complete it. You can use struct to process c The structure in the language.

The three most important

functions in the struct module are pack(), unpack(), calcsize()

pack(fmt, v1, v2, ...) According to the given format (fmt), pack the data into a

string (actually a byte stream similar to a c structure )

unpack(fmt,

string) Parse the byte stream string according to the given format (fmt) and return the parsed tuple

calcsize(fmt) Calculate to How many bytes of memory does a certain format (fmt) occupy?

The supported formats in struct are as follows:

Format C Type Python Number of bytes

x pad byte no value 1

c char string of length 1 1

b signed char

integer 1

B unsigned char integer 1

? _Bool bool 1

h short integer 2

H unsigned short integer 2

i int integer 4

I unsigned int integer or long 4

l long integer 4

L unsigned long long 4

q long long long 8

Q unsigned long long long 8

f

float float 4

d double float 8

s char[] string 1

p char[] string 1

P void * long

Note 1.q and Q are only interesting when the machine supports 64-bit operation

Note 2. There can be a number before each format to indicate the number

Note 3. The s format represents a string of a certain length, 4s represents a string of length 4, but p represents a pascal string

Note 4. P is used to convert a pointer. Its length is related to the machine word length

Note 5. The last one can be used to represent the pointer type and occupies 4 bytes

In order to exchange data with the structure in c, we must also consider Some c or c++ compilers use byte alignment, usually 4 bytes for 32-bit systems, so the struct is converted according to the local machine byte order. You can use the first character in the format to change the alignment. .Definition is as follows:

Character Byte order Size and alignment

@ native native Make up 4 bytes

= native standard According to the original number of bytes

< little-endian standard based on the original number of bytes

> big-endian standard based on the original number of bytes

! network (= big-endian)

standard According to the original number of bytes

The usage method is to put it at the first position of fmt, just like '@5s6sif'

Example 1:

The structure is as follows:

struct Header
{
    unsigned short id;
    char[4] tag;
    unsigned int version;
    unsigned int count;
}
The above structure data was received through socket.recv, which is stored in the string s. Now it needs to be parsed out. You can use the unpack() function:

import struct
id, tag, version, count = struct.unpack("!H4s2I", s)
The above format string , ! indicates that we need to use network byte order analysis, because our data is received from the network, and it is in network byte order when transmitted on the network. The following H represents an unsigned short id, 4s Represents a 4-byte long string, 2I represents two unsigned int type data.

Through an unpack, now our information has been saved in id, tag, version, and count.

Similarly, you can also easily pack local data into struct format:

ss = struct.pack("!H4s2I", id, tag, version, count);
The pack function converts id, tag, version, count into structure Header, ss according to the specified format Now it is a string (actually a byte stream similar to a c structure), which can be sent out through socket.send(ss).

Example 2:

import struct
a=12.34
#将a变为二进制
bytes=struct.pack('i',a)
At this time bytes is a string string, and the string is the same as the binary storage content of a in bytes.

Then perform the reverse operation, and convert the existing binary data bytes (actually a string) into python's

data type :

#Note, unpack returns a tuple!!

a,=struct.unpack('i',bytes)
If it is composed of multiple data, it can be like this:

a='hello'
b='world!'
c=2
d=45.123
bytes=struct.pack('5s6sif',a,b,c,d)
The bytes at this time are data in binary form, and can be written directly to the file, for example binfile.write(bytes)

Then, when we need it, we can read it out, bytes=binfile.read()

再通过struct.unpack()解码成python变量:

a,b,c,d=struct.unpack('5s6sif',bytes)

’5s6sif’这个叫做fmt,就是格式化字符串,由数字加字符构成,5s表示占5个字符的字符串,2i,表示2个整数等等,下面是可用的字符及类型,ctype表示可以与python中的类型一一对应。

注意:二进制文件处理时会碰到的问题

我们使用处理二进制文件时,需要用如下方法:

binfile=open(filepath,'rb')    
#读二进制文件
binfile=open(filepath,'wb')   
#写二进制文件

那么和binfile=open(filepath,’r')的结果到底有何不同呢?

不同之处有两个地方:

第一,使用’r'的时候如果碰到’0x1A’,就会视为文件结束,这就是EOF。使用’rb’则不存在这个问题。即,如果你用二进制写入再用文本读出的话,如果其中存在’0X1A’,就只会读出文件的一部分。使用’rb’的时候会一直读到文件末尾。

第二,对于字符串x=’abc\ndef’,我们可用len(x)得到它的长度为7,\n我们称之为换行符,实际上是’0X0A’。当我们用’w'即文本方式写的时候,在windows平台上会自动将’0X0A’变成两个字符’0X0D’,’0X0A’,即文件长度实际上变成8.。当用’r'文本方式读取时,又自动的转换成原来的换行符。如果换成’wb’二进制方式来写的话,则会保持一个字符不变,读取时也是原样读取。所以如果用文本方式写入,用二进制方式读取的话,就要考虑这多出的一个字节了。’0X0D’又称回车符。linux下不会变。因为linux只使用’0X0A’来表示换行。

The above is the detailed content of Detailed introduction to Python's use of struct to process binary (pack and unpack usage). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn