How to read binary data in Python?-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to read binary data in Python?

PHPz

May 08, 2023 pm 06:58 PM

python

bytes

bytes: A type of character sequence. By comparing dir(str) and dir(bytes), we can see that the properties and methods of the two are very similar, with only a few differences. Therefore, bytes can also have various operation methods on byte sequences like string, such as search (find), length (len), cutting (split), slicing, etc.

The advantage of bytes is that it is a built-in method in Python and does not require the installation of additional third-party modules.

But the disadvantage is also obvious: it can only query a single query, and cannot query multiple required results at one time.

First open the file through the rb mode of open and read the content as bytes type. There is a find() method to find a specific string, but this method can only find the first string index that meets the requirements, and it does not give a single-bit index, but an 8-bit one-byte index. When you need to find multiple matching strings, there is no built-in findall() method. If you want to query multiple, the process will be troublesome. First find the first matching index 1, start with this index 1, query the second matching index 2, and so on until the end of the query.

with open(path, &#39;rb&#39;) as f:
    datas = f.read()
    start_char = datas.find(b&#39;Start&#39;)
    # start_char2 = datas.find(b&#39;Start&#39;, start_char)
    end_char = datas.find(b&#39;End&#39;, start_char)
    # end_char2 = datas.find(b&#39;End&#39;, start_char2)
    data = datas[start_char:end_char]
    print(data)

Note that in the above code, start_char and end_char will appear multiple times, and the times are not necessarily the same. It is necessary to obtain the content between the two indexes, but it can neither be looped nor checked at once. The commented line of code needs to be executed multiple times to obtain the keyword index. Since we don’t know how many start flags there will be in the file data, we don’t know how many times it will be executed. This should be solved by looping, but there seems to be no variable for looping. This makes the problem more complex.

Secondly, since the content between the two signs is obtained, the above process needs to be performed twice. Therefore, the process is even more complicated.

Therefore, finding new methods is completely necessary.

bitstring

bitstring is a three-party package that reads binary files in the form of byte streams.

The first sentence of the bitstring.py file is: This package defines classes that simplify bit-wise creation, manipulation and interpretation of data.

The translation is as follows: This package defines classes that simplify bit-wise creation, manipulation and interpretation of data. Bit-by-bit creation, manipulation, and interpretation of data.

The simple understanding is to directly operate bytes type data.

There are four main categories, as follows:

Bits -- An immutable container for binary data.
BitArray -- A mutable container for binary data.
ConstBitStream -- An immutable container with streaming methods.
BitStream -- A mutable container with streaming methods.

Bits -- An immutable container of binary data.
BitArray -- Mutable container of binary data.
ConstBitStream -- Immutable container with stream methods.
BitStream -- Mutable container with stream methods.

Like bytes, first read the file content, find the keyword index, and slice to obtain the data content.

# update at 2022/05/06 start
# from bistring import ConstBitStream, BitStream
from bitstring import ConstBitStream, BitStream
# update at 2022/05/06 end

hex_datas = ConstBitStream(filename=path)  # 读取文件内容
start_char = b&#39;Start&#39;
start_chars = hex_datas.findall(start_char, bytealigned=True)  # 一次找到全部符合的，返回一个生成器
start_indexs = []
for start_char in start_chars:
    start_indexs.append(start_char)

end_char = b&#39;End&#39;
end_indexs = []
for start_index in start_indexs:
    end_chars = hex_datas.find(end_char, start=start_index, bytealigned=True)  # 找到第一个符合的，返回元组
    for end_char in end_chars:
        end_indexs.append(end_char)

result = []
for i in range(min(len(start_indexs), len(end_indexs))):
    hex_data = hex_datas[start_indexs[i]:end_indexs[i]]
    str_data = BitStream.tobytes(hex_data).decode(&#39;utf-8&#39;)
    result.append(str_data)

Code analysis, first import the two required classes: ConstBitStream, BitStream. To get the file content, findall() finds all matching string indexes, and find() finds the first matching string index. Take the smaller value of the two lists of start and end, and slice to obtain the data. The type is "bitstring.ConstBitStream". The BitStream.tobytes() method converts it to bytes type. Chinese characters will be garbled, so use decode() to decode and get required string.

The whole process is still concise and continuous. The findall(), find(), and tobytes() methods are used in the code. In addition, there are many small details that need to be paid attention to. For example, if start_indexs is empty, subsequent code should not be executed, and the same is true for end_indexs if it is empty.

It can be seen that the bitstring package is relatively easy to use. According to the needs, there are relatively few methods used. In fact, there are many other methods, choose as needed.

The above is the detailed content of How to read binary data in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

Python vs. C : Understanding the Key DifferencesApr 21, 2025 am 12:18 AM

Python and C each have their own advantages, and the choice should be based on project requirements. 1) Python is suitable for rapid development and data processing due to its concise syntax and dynamic typing. 2)C is suitable for high performance and system programming due to its static typing and manual memory management.

Python vs. C : Which Language to Choose for Your Project?Apr 21, 2025 am 12:17 AM

Choosing Python or C depends on project requirements: 1) If you need rapid development, data processing and prototype design, choose Python; 2) If you need high performance, low latency and close hardware control, choose C.

Reaching Your Python Goals: The Power of 2 Hours DailyApr 20, 2025 am 12:21 AM

By investing 2 hours of Python learning every day, you can effectively improve your programming skills. 1. Learn new knowledge: read documents or watch tutorials. 2. Practice: Write code and complete exercises. 3. Review: Consolidate the content you have learned. 4. Project practice: Apply what you have learned in actual projects. Such a structured learning plan can help you systematically master Python and achieve career goals.

Maximizing 2 Hours: Effective Python Learning StrategiesApr 20, 2025 am 12:20 AM

Methods to learn Python efficiently within two hours include: 1. Review the basic knowledge and ensure that you are familiar with Python installation and basic syntax; 2. Understand the core concepts of Python, such as variables, lists, functions, etc.; 3. Master basic and advanced usage by using examples; 4. Learn common errors and debugging techniques; 5. Apply performance optimization and best practices, such as using list comprehensions and following the PEP8 style guide.

Choosing Between Python and C : The Right Language for YouApr 20, 2025 am 12:20 AM

Python is suitable for beginners and data science, and C is suitable for system programming and game development. 1. Python is simple and easy to use, suitable for data science and web development. 2.C provides high performance and control, suitable for game development and system programming. The choice should be based on project needs and personal interests.

Python vs. C : A Comparative Analysis of Programming LanguagesApr 20, 2025 am 12:14 AM

Python is more suitable for data science and rapid development, while C is more suitable for high performance and system programming. 1. Python syntax is concise and easy to learn, suitable for data processing and scientific computing. 2.C has complex syntax but excellent performance and is often used in game development and system programming.

2 Hours a Day: The Potential of Python LearningApr 20, 2025 am 12:14 AM

It is feasible to invest two hours a day to learn Python. 1. Learn new knowledge: Learn new concepts in one hour, such as lists and dictionaries. 2. Practice and exercises: Use one hour to perform programming exercises, such as writing small programs. Through reasonable planning and perseverance, you can master the core concepts of Python in a short time.

Python vs. C : Learning Curves and Ease of UseApr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

See all articles