search

Home  >  Q&A  >  body text

stl - c/c++ Questions about binary operations (by the way, looking for good libraries and methods)

I am currently working on a project involving content compression, which requires the data to be processed in a binary manner (such as LZ77 compression, Huffman coding, etc.), which involves a lot of functions and functions.

But when I did it myself, I found that although c provides a relatively comprehensive binary operation, it is mainly based on a certain data type (such as char), and char is 8-bit by default, which is not suitable for cross-character The operation is not very convenient and requires a lot of judgment.

Maybe my expression is not very clear, or there is a problem with my understanding of c, but I hope to find a method that can handle binary in a smooth manner... Or is there a library that is easier to use?

Is there any friend who has experience in this area who can give some advice? I would be very grateful. If it helps, I will send you a red envelope privately to express my gratitude.

天蓬老师天蓬老师2761 days ago831

reply all(2)I'll reply

  • 巴扎黑

    巴扎黑2017-06-05 11:13:09

    If you want a larger character representation range, you can use wchar_t.
    If you want a type that can set the data in the memory to start with any bit as a byte, it is definitely impossible to achieve.

    But you can
    `nl=((arr[0]&c0)>>6);
    nh=((arr[1] &0x0f) << 2);
    n=nh|nl;`
    Let’s combine the high 2 bits of the previous byte and the low 4 bits of the next byte into a 6-digit binary number

    reply
    0
  • 给我你的怀抱

    给我你的怀抱2017-06-05 11:13:09

    Already using binary, do you still consider characters?
    Binary should only consider bytes, right?
    Compression algorithm often does not consider characters, because the input file may be ASCII, UTF-8, UTF-16 (LE/BE), or GBK, etc. wait. But in the final analysis, the commonality of languages ​​is that there are certain laws in statistics.
    For example, e appears frequently in English, or words like is appear frequently, or affixes such as se and tor appear frequently.
    The fundamental principle of compression is to use as few bits as possible to represent things that appear frequently and reduce redundancy, so it generally has nothing to do with characters.

    You should learn more about compression algorithms. LZ77/LZ78/Huffman are all general compression algorithms. They are not limited to text and are based on bits.

    Back to C++, C++ requires that the size of something can be known at compile time (stack allocated memory is determined by the compiler), but the input text is only known at runtime, so in theory there is no good method. Character sets and text encoding are a deep pit, and it is not recommended to jump into it.

    If you want to perform operations in bit units, encapsulate a library yourself, or there are some libraries such as boost that can be used. After all, the minimum in the CPU is one byte, but it can only operate every bit in it.

    reply
    0
  • Cancelreply