Home  >  Article  >  Backend Development  >  A simple way to read and write binary files using Python

A simple way to read and write binary files using Python

高洛峰
高洛峰Original
2017-02-23 11:22:323010browse

The general feeling is that python itself does not support binary, but it provides a module to make up for it, which is the struct module.

Python does not have a binary type, but it can store binary type data, that is, use the string string type to store binary data. This does not matter, because string is based on 1 byte.

import struct

a=12.34

#Convert a into binary

bytes=struct.pack('i',a)

At this time, bytes is a string string. The string is the same as the binary storage content of a in bytes.

Reverse operation

Existing binary data bytes, (actually a string), convert it back into python data type:

a,=struct .unpack('i',bytes)

Note that unpack returns a tuple

So if there is only one variable:

bytes=struct.pack('i' ,a)

Then, you need to do this when decoding

a,=struct.unpack('i',bytes) or (a,)=struct.unpack('i',bytes )

If you use a=struct.unpack('i',bytes) directly, then a=(12.34,) is a tuple instead of the original floating point number.

If it is composed of multiple data, it can be like this:

a='hello'

b='world!'

c=2

d=45.123

bytes=struct.pack('5s6sif',a,b,c,d)

The bytes at this time are data in binary form, and you can directly Write a file such as binfile.write(bytes)

Then, when we need it, we can read it out, bytes=binfile.read()

and then decode it into python through struct.unpack() Variable

a,b,c,d=struct.unpack('5s6sif',bytes)

'5s6sif' is called fmt, which is a format string, consisting of numbers and characters. 5s represents a string of 5 characters, 2i represents 2 integers, etc. The following are the available characters and types. ctype represents a one-to-one correspondence with the types in python.


##FormatC TypePythonNumber of bytesxpad byteno value1ccharstring of length 11##bB?hHiIl##Lunsigned longlong4long longunsigned long longfloatdoublechar[]char[]void *In order to exchange data with structures in c, it is also necessary to consider that some c or c++ compilers use byte alignment, usually 4 A 32-bit system in which bytes are used, so
signed char integer 1
unsigned char integer 1
_Bool bool 1
short integer 2
unsigned short integer 2
int integer 4
unsigned int integer or long 4
long integer 4
##q
long 8 Q
long 8 f
float 4 d
float 8 s
string 1 p
string 1 P
long
##Last one Can be used to represent pointer types, occupying 4 bytes


##Character

Byte orderSize and alignment##@nativenative Make up enough 4 bytes=nativestandard    According to the original number of bytes3684c01df81234e108615518d17e1f2dbig-endian##!network (= big-endian)standard Based on the original number of bytes

The method of use is to put it in the first position of fmt, just like '@5s6sif'

-----Problems encountered when processing binary files--- --

When we process binary files, we need to use the following method

binfile=open(filepath,'rb') to read binary files

or

binfile=open(filepath,'wb') Write binary file

So what is the difference between the results of binfile=open(filepath,'r') and binfile=open(filepath,'r')?

There are two differences:

First, if you encounter '0x1A' when using 'r', it will be regarded as the end of the file, which is EOF. Using 'rb' does not have this problem. That is, if you write in binary and read out in text, only part of the file will be read out if '0X1A' is present. When using 'rb', it will read to the end of the file.

Second, for the string x='abc/ndef', we can use len(x) to get its length to be 7. /n is called a newline character, which is actually '0X0A'. When we use 'w' to write in text mode, '0X0A' will be automatically changed into two characters '0X0D' and '0X0A' on the Windows platform, that is, the file length actually becomes 8. When reading in 'r' text mode, it is automatically converted to the original newline character. If you change to 'wb' binary mode to write, one character will remain unchanged, and it will be read as it is when reading. So if you write in text mode and read in binary mode, you have to consider this extra byte. '0X0D' is also called the carriage return character.
It will not change under Linux. Because linux only uses '0X0A' to represent line breaks.

The above simple method (recommended) for reading and writing binary files using Python is all the content shared by the editor. I hope it can give you a reference, and I hope you will support the PHP Chinese website.

For more related articles on simple methods of reading and writing binary files using Python, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn