search
HomeBackend DevelopmentPython TutorialOriginal words rewritten: An unexpected discovery is that what was originally regarded as a bug is actually a feature in the design of Protobuf

Hello everyone, I am amazing.

Recently, in our project, we use the protobuf format as a carrier for storing data. I accidentally buried a big hole for myself, but it took me a long time to discover it.

Introduction to protobuf

protobuf’s full name is Protocol buffers. It was developed by Google and is a cross-language, cross-platform, and scalable serialized data Mechanisms. Similar to XML, but smaller, faster, and simpler. You only need to define once how you want your data to be structured, and then you can use its generation tools to generate source code that includes some serialization and deserialization operations. Structured data can be easily written and read from a variety of data streams and using a variety of programming languages.

The proto2 version supports code generation in Java, Python, Objective-C and C. With the new proto3 language version, you can also use Kotlin, Dart, Go, Ruby, PHP and C#, and many more languages.

How did you find it?

In our new project, we store the data of the project run by using protobuf format. In this way, during the debugging process, we may perform local debugging based on the data recorded on site.

message ImageData {
// ms
int64 timestamp = 1;
int32 id = 2;
Data mat = 3;
}

message PointCloud {
// ms
int64 timestamp = 1;
int32 id = 2;
PointData pointcloud = 3;
}

message State {
// ms
int64 timestamp = 1;
string direction = 2;
}

message Sensor {
repeated PointCloud point_data = 1;
repeated ImageData image_data = 2;
repeated State vehicle_data = 3;
}

We define such a set of data, and then when storing, because the frame rates of the three data sources of Sensor are different, when storing, a single Sensor actually only contains one set of data. In addition, Two types of data are not included.

We didn't encounter problems when we only recorded a single pack. Until we feel that a single packet cannot be recorded for a long time, we need to find a solution to split the packet.

At that time, I thought this must be very simple, so we set it up. When a package reaches 500M, we will store the subsequent data in a new package. I finished writing it very smoothly and then put it on site for data recording. After recording for a while, we took the package back and simulated testing our new program. It was found that there was a problem in parsing the data of some packages. The program will get stuck in the middle of running. After many tests, it was found that some packages have this problem.

What we suspected at first was that the way to judge the file size was wrong, which affected subcontracting. Because when judging the file size, the file will be opened. But after judging several other ways of not opening the file, the split was carried out. I still encountered problems with some of the recorded packages.

Only then did I suspect that protobuf has some special requirements for storing data. Later, I read some articles and learned that protobuf requires identifiers to store multiple sets of data into one file. Otherwise, when parsing back from the file, protobuf does not know where the stop character of a single data is, causing data parsing errors.

Here, this pit appears. We store a series of data into a single package without any separator operations. When protobuf parses, all the contents in the file are parsed into a single Sensor. Sensor contains all data, and protobuf actively merges all stored data.

At this time, I discovered that when I recorded single packets in the past, the data was all correct. That was really my luck. protobuf happens to be parsed successfully.

How to solve it?

Now that we know that protobuf will operate in this way, we only need to know how to divide protobuf. This method is really hard to find because there are too few people like us who use it. Chinese search can’t find this content at all. Maybe everyone doesn’t use protobuf to store data. The method everyone uses should be the scenario of interaction among multiple services.

Finally found the answer through some answers on stackoverflow. From the answers, I learned that this solution was only officially merged in protobuf 3.3. It seems that this function is really rarely used.

bool SerializeDelimitedToOstream(const MessageLite& message,
 std::ostream* output);
bool ParseDelimitedFromZeroCopyStream(
MessageLite* message, io::ZeroCopyInputStream* input, bool* clean_eof);

Through this pair of methods, files can be stored and read one by one according to the data flow. No more worrying about data being merged and read.

Of course, the data stored in this way cannot be parsed by the original parsing method, and the format of the storage has completely changed. This method will store the size of the binary data first, and then store the binary data.

Conclusion

After a lot of tossing, I finally solved this segmentation pit. The usage scenario may be relatively niche, resulting in a lot of information that cannot be found at all. I discovered these problems by looking at the source code myself. The source code of C is really difficult to read. There are many template methods and template classes and it is easy to miss some details. Finally, I looked at the C# code and finally confirmed it.

The above is the detailed content of Original words rewritten: An unexpected discovery is that what was originally regarded as a bug is actually a feature in the design of Protobuf. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
MobileSAM:为移动设备提供高性能的轻量级图像分割模型MobileSAM:为移动设备提供高性能的轻量级图像分割模型Jan 05, 2024 pm 02:50 PM

一、引言随着移动设备的普及和计算能力的提升,图像分割技术成为了研究的热点。MobileSAM(MobileSegmentAnythingModel)是一种针对移动设备优化的图像分割模型,旨在在保持高质量分割结果的同时,降低计算复杂度和内存占用,以便在资源有限的移动设备上高效运行。本文将详细介绍MobileSAM的原理、优势和应用场景。二、MobileSAM模型的设计思路MobileSAM模型的设计思路主要包括以下几个方面:轻量级模型:为了适应移动设备的资源限制,MobileSAM模型采用了轻量级

如何在Python中使用图像语义分割技术?如何在Python中使用图像语义分割技术?Jun 06, 2023 am 08:03 AM

随着人工智能技术的不断发展,图像语义分割技术已经成为图像分析领域的热门研究方向。在图像语义分割中,我们将一张图像中的不同区域进行分割,并对每个区域进行分类,从而达到对这张图像的全面理解。Python是一种著名的编程语言,其强大的数据分析和数据可视化能力使其成为了人工智能技术研究领域的首选。本文将介绍如何在Python中使用图像语义分割技术。一、前置知识在深入

Golang与FFmpeg: 如何实现音频合成和分割Golang与FFmpeg: 如何实现音频合成和分割Sep 27, 2023 pm 10:52 PM

Golang与FFmpeg:如何实现音频合成和分割,需要具体代码示例摘要:本文将介绍如何使用Golang和FFmpeg库来实现音频合成和分割。我们将用到一些具体的代码示例来帮助读者更好地理解。引言:随着音频处理技术的不断发展,音频合成和分割已经成为日常生活和工作中常见的功能需求。而Golang作为一种快速,高效且易于编写和维护的编程语言,加上FFmpeg作

Python 教程:如何使用 Python 分割和合并大文件?Python 教程:如何使用 Python 分割和合并大文件?Apr 22, 2023 am 11:43 AM

有时候,我们需要把一个大文件发送给别人,但是限于传输通道的限制,比如邮箱附件大小的限制,或者网络状况不太好,需要将大文件分割成小文件,分多次发送,接收端再对这些小文件进行合并。今天就来分享一下用Python分割合并大文件的方法。思路及实现如果是文本文件,可以按行数分割。无论是文本文件还是二进制文件,都可以按指定大小进行分割。使用Python的文件读写功能就可以实现文件的分割与合并,设置每个文件的大小,然后读取指定大小的字节就写入一个新文件,接收端依次读取小文件,把读取到的字节按序写入一个文件,就

减小win10录屏文件大小的建议减小win10录屏文件大小的建议Jan 04, 2024 pm 12:05 PM

许多的小伙伴都需要录屏进行办公或者传输文件,但是有时候会出现文件过大的问题制造了很多麻烦,下面就给大家带来了文件过大的解决方法,一起看看吧。win10录屏文件太大怎么办:1、下载软件格式工厂来进行压缩文件。下载地址>>2、进入主页面,点击“视频-MP4”选项。3、在转换格式页面中点击“添加文件”,选择要压缩的MP4文件。4、点击页面“输出配置”,通过输出质量来压缩文件。5、下拉配置列表选择“低质量和大小”点击“确定”。6、点击“确定”完成视频文件的导入。7、点击“开始”进行转化。8、完成后即可

原话重写:一个意外的发现是,原本被视为 bug 的问题实际上是 Protobuf 设计中的一种特性原话重写:一个意外的发现是,原本被视为 bug 的问题实际上是 Protobuf 设计中的一种特性May 09, 2023 pm 04:22 PM

大家好,我是了不起。最近我们在项目中,通过使用protobuf格式作为存储数据的一个载体。一个不小心就给自己埋了个大坑,还是过了好久才发现。protobuf简介protobuf全名叫Protocalbuffers.它是由Google研发的,一种可跨语言、可跨平台、可扩展的序列化数据的机制。类似于XML,但是它更小、更快、更简单。你只需要定义一次你希望的数据如何被结构化,然后你可以使用它的生成工具,生成包含一些序列化和反序列化等操作的源代码。可以轻松地从各种数据流和使用各种编程语言写入

在PHP中使用explode()函数将字符串分割成数组在PHP中使用explode()函数将字符串分割成数组Jun 27, 2023 am 10:18 AM

在PHP开发中,字符串常常需要被分割成若干个子字符串,以便我们在处理数据时更加方便。此时,PHP提供了explode()函数来帮助我们实现这个目的。explode()函数的基本语法为:explode(string$delimiter,string$string[,int$limit=PHP_MAXPATHLEN])其中,$delimiter

视频分割大结局!浙大最新发布SAM-Track:通用智能视频分割一键直达视频分割大结局!浙大最新发布SAM-Track:通用智能视频分割一键直达May 23, 2023 pm 02:07 PM

近期,浙江大学ReLER实验室将SAM与视频分割进行深度结合,发布Segment-and-TrackAnything(SAM-Track)。SAM-Track赋予了SAM对视频目标进行跟踪的能力,并支持多种方式(点、画笔、文字)进行交互。在此基础上,SAM-Track统一了多个传统视频分割任务,达成了一键分割追踪任意视频中的任意目标,将传统视频分割外推至通用视频分割。SAM-Track具有卓越的性能,在复杂场景下仅需单卡就能高质量地稳定跟踪数百个目标。项目地址:https://github.co

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.