Home >类库下载 >C# class library >Process bitstream-based data using C#

Process bitstream-based data using C#

高洛峰
高洛峰Original
2016-10-14 17:03:572086browse

Use C# to process bit stream-based data

0x00 Cause

Recently, we need to process some bit stream-based data. Computer processing data is generally in byte (8bit), and the same is true for data read using BinaryReader, even if Reading bool type is also a byte. However, with the help of some methods provided in the C# basic class library, bit-based data can also be read. After completing the task, I felt that bit-based data was quite interesting, so I tried using 7-bit and 6-bit encoding to encode common ASCII characters. Finally, I will write something new as a blog. On the one hand, it will be a record, and on the other hand, I hope it will be helpful to gardeners with similar needs.

0x01 Reading of bit stream data

Suppose we have a byte b = 35, and we need to read the first 4 bits and the last 4 bits into two numbers respectively, so what should we do? Although there is no ready-made method in the basic class library, it can be done in two steps by using binary strings.

1. First represent b as a binary string 00100011

2. Convert the 4 bits before and after it into numbers. The core method is:

Convert.ToInt32("0010");

In this way, bit-based data reading is achieved.

There are many ways to convert byte into binary string in the first step,

1. The simplest Convert.ToString(b,2). If there are not enough 8 bits, add 0 in the high bits.

2. You can also perform an AND operation on byte with 1,2,4,8...128 respectively, and take out the bits from low to high.

3. You can also perform an AND operation on byte and 32, then shift the byte to the left and perform an AND operation with 128 again.

The first method will generate a large number of string objects. I didn’t find much difference in the 2nd and 3rd methods. I chose 3 purely based on my feeling. The code is as follows:

public static char[] ByteToBinString(byte b)
{
  var result = new char[8];
  for (int i = 0; i < 8; i++)
  {
    var temp = b & 128;
    result[i] = temp == 0 ? &#39;0&#39; : &#39;1&#39;;
    b = (byte)(b << 1);
  }
  return result;
}

In order to convert byte[] into a binary string, you can

Public string BitReader(byte[] data)
{
    BinString = new StringBuilder(data.Length * 8);
    for (int i = 0; i < data.Length; 
    {
         BinString.Append(ByteToBinString(data[i]));
    }
    return BinString.ToString();
}

In this way, when the byte[] data is obtained, it can be converted into a binary string and saved. According to the offset bit position and The bit length is read from the binary string and converted to bool, Int16, Int32, etc. Based on this idea, you can write a BitReader class, which uses StringBuilder to store binary strings and provides a Read method to read data from binary strings. In order to better handle the data flow, a Position is added to record the current offset. When certain Read methods are used to read data, the Position will also move accordingly. For example, if you use ReadInt16 to read data, BitReader will read 16 bits from the current position of Position and convert it to Int16 and return it. At the same time, Position will move backward by 16 bits. The way to distinguish is that when the starting offset position needs to be specified when reading data, the Position does not move. When reading directly from the current Position, the Position moves. Part of the BitReader class code is as follows:

public class BitReader
{
    public readonly StringBuilder BinString;
    public int Position { get; set; }

    public BitReader(byte[] data)
    {
        BinString = new StringBuilder(data.Length * 8);
        for (int i = 0; i < data.Length; i++)
        {
            BinString.Append(ByteToBinString(data[i]));
        }
        Position = 0;
    }

    public byte ReadByte(int offset)
    {
        var bin = BinString.ToString(offset, 8);
        return Convert.ToByte(bin, 2);
    }

    public byte ReadByte()
    {
        var result = ReadByte(Position);
        Position += 8;
        return result;
    }

    public int ReadInt(int offset, int bitLength)
    {
        var bin = BinString.ToString(offset, bitLength);
        return Convert.ToInt32(bin, 2);
    }

    public int ReadInt(int bitLength)
    {
        var result = ReadInt(Position, bitLength);
        Position += bitLength;
        return result;
    }

    public static char[] ByteToBinString(byte b)
    {
        var result = new char[8];
        for (int i = 0; i < 8; i++)
        {
            var temp = b & 128;
            result[i] = temp == 0 ? &#39;0&#39; : &#39;1&#39;;
            b = (byte)(b << 1);
        }
        return result;
     }
}

Use BitReader to buff from byte[] according to 4bit = {35,12}; Reading data can be like this:

var reader = new BitReader(buff); //二进制字符串为0010001100001100

var num1 = reader.ReadInt(4);   //从当前Position读取4bit为int,Position移动4bit,结果为2,当前Position=4

var num2 = reader.ReadInt(5,6);  //从偏移为5bit的位置读取6bit为int,Position不移动,结果为48,当前Position=4

var b = reader.ReadBool();  //从当前Position读取1bit为bool,Position移动1bit,结果为False,当前Position=5

0x02 Writing of bit stream data

Writing data to bit stream is a reverse process. We use the BitWriter class to implement it, in which StringBuilder is stored to save the binary String, when writing data, you need to pass in the data and specify the number of bits required to save this data. After writing is completed, the binary string saved in StringBuilder can be converted into byte[] according to 8bit and returned. The core part of BitWriter is as follows:

public class BitWriter
{
    public readonly StringBuilder BinString;

    public BitWriter()
    {
        BinString = new StringBuilder();
    }

    public BitWriter(int bitLength)
    {
        var add = 8 - bitLength % 8;
        BinString = new StringBuilder(bitLength + add);
    }

    public void WriteByte(byte b, int bitLength=8)
    {
        var bin = Convert.ToString(b, 2);
        AppendBinString(bin, bitLength);
    }

    public void WriteInt(int i, int bitLength)
    {
        var bin = Convert.ToString(i, 2);
        AppendBinString(bin, bitLength);
    }

    public void WriteChar7(char c)
    {
        var b = Convert.ToByte(c);
        var bin = Convert.ToString(b, 2);
        AppendBinString(bin, 7);
    }

    public byte[] GetBytes()
    {
        Check8();
        var len = BinString.Length / 8;
        var result = new byte[len];

        for (int i = 0; i < len; i++)
        {
            var bits = BinString.ToString(i * 8, 8);
            result[i] = Convert.ToByte(bits, 2);
        }

        return result;
    }

    public string GetBinString()
    {
        Check8();
        return BinString.ToString();
    }


    private void AppendBinString(string bin, int bitLength)
    {
        if (bin.Length > bitLength)
            throw new Exception("len is too short");
        var add = bitLength - bin.Length;
        for (int i = 0; i < add; i++)
        {
            BinString.Append(&#39;0&#39;);
        }
        BinString.Append(bin);
    }

    private void Check8()
    {
        var add = 8 - BinString.Length % 8;
        for (int i = 0; i < add; i++)
        {
            BinString.Append("0");
        }
    }
}

Here is a simple example:

var writer = new BitWriter();

writer.Write(12,5);  //把12用5bit写入,此时二进制字符串为:01100

writer.Write(8,16);  //把8用16bit写入,此时二进制字符串为:011000000000000001000

var result = writer.GetBytes(); //8bit对齐为011000000000000001000000
                                //返回结果为[96,0,64]

0x03 7-bit character encoding

Our commonly used ASCII characters are encoded using 8bit, but the really commonly used characters are only 7bit, and the highest bit is 0, so for an English article, we can use 7bit to re-encode without losing information. The encoding process is to take out the article characters in sequence, write them in 7bit using BitWriter, and finally obtain the newly encoded byte[]. In order to be able to read correctly, we stipulate that when the 8-bit data is read as 2, it means the beginning of the data, and the next 16-bit data is the number of subsequent characters. The code is as follows:

public byte[] Encode(string text)
    {
        var len = text.Length * 7 + 24;

        var writer = new BitWriter(len);
        writer.WriteByte(2);
        writer.WriteInt(text.Length, 16);

        for (int i = 0; i < text.Length; i++)
        {
            var b = Convert.ToByte(text[i]);
            writer.WriteByte(b, 7);
        }

        return writer.GetBytes();
    }

When reading data, we first look for the start identifier, then read out the number of characters, and read the characters in sequence according to the number of characters. The code is as follows:

public string Decode(byte[] data)
    {
        var reader = new BitReader(data);
        while (reader.Remain > 8)
        {
            var start = reader.ReadByte();
            if (start == 2)
                break;
        }
        var len = reader.ReadInt(16);
        var result = new StringBuilder(len);
        for (int i = 0; i < len; i++)
        {
            var b = reader.ReadInt(7);
            var ch = Convert.ToChar(b);
            result.Append(ch);
        }

        return result.ToString();
    }

Due to the existence of the data header, when encoding After encoding only a few characters, the data becomes longer

Process bitstream-based data using C#

不过随着字符越多,编码后节省的越多。

Process bitstream-based data using C#

0x04 6比特字符编码

从节省数据量的角度,如果允许损失部分信息,例如损失掉字母大小写,是可以进一步减少编码所需比特数的。26个字母+10个数字+符号,可以用6bit(64)进行编码。不过使用这种编码方式就不能用ASCII的映射方式了,我们可以自定义映射,例如0-10映射为十个数字等等,也可以使用自定义的字典,也就是传说中的密码本。经常看国产谍战片的应该都知道密码本吧,密码本就是一个字典,把字符进行重新映射获取明文,算是简单的单码替代,加密强度很小,在获取足量数据样本后基于统计很容易就能破解。下面我们就尝试基于自定义字典用6bit重新编码。

编码过程:

仍然像7bit编码那样写入消息头,然后依次取出文本中的字符,从字典中找到对应的数字,把数字按照6bit长度写入到BitWriter

public byte[] Encode(string text)
    {
        text = text.ToUpper();
        var len = text.Length * 6 + 24;

        var writer = new BitWriter(len);
        writer.WriteByte(2);
        writer.WriteInt(text.Length, 16);

        for (int i = 0; i < text.Length; i++)
        {
            var index = GetChar6Index(text[i]);
            writer.WriteInt(index, 6);
        }

        return writer.GetBytes();

    }

    private int GetChar6Index(char c)
    {
        for (int i = 0; i < 64; i++)
        {
            if (Dict.Custom[i] == c)
                return i;
        }
        return 10; //return *
    }

解码过程:

解码也很简单,找到消息头,依次按照6bit读取数据,并从字典中找到对应的字符:

public string Decode(byte[] data)
{
    var reader = new BitReader(data);
    while(reader.Remain > 8)
    {
        var start = reader.ReadByte();
        if (start == 2)
            break;
    }
    var len = reader.ReadInt(16);
    var result = new StringBuilder(len);
    for (int i = 0; i < len; i++)
    {
        var index = reader.ReadInt(6);
        var ch = Dict.Custom[index];
        result.Append(ch);
    }

    return result.ToString();
}

同样一段文本用6bit自定义字典编码后数据长度更短了,不过损失了大小写和换行等格式。

Process bitstream-based data using C#

如果从加密的角度考虑,可以设置N个自定义字典(假设10个),在消息头中用M bit(例如4bit)表示所用的字典。这样在每次编码时随机选择一个字典编码,解码时根据4bit数据选择相应字典解码,并且定时更换字典可以增大破解难度。感兴趣的园友可以自行尝试。

0x05 写在最后

以上是我处理比特流数据的一点心得,仅仅是我自己能想到的一种方法,满足了我的需求。如果有更效率的更合理的方法,希望赐教。另外编码和解码的两个例子是出于有趣写着玩的,在实际中估计也用不到。毕竟现在带宽这么富裕,数据加密也有N种可靠的多的方式。

示例代码:https://github.com/durow/TestArea/tree/master/BitStream


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Related articles

See more