search
HomeBackend DevelopmentC++How to implement data compression and decompression algorithms in C++?

How to implement data compression and decompression algorithms in C++?

Aug 25, 2023 pm 06:54 PM
Compression algorithmc++ programmingDecompression algorithm

How to implement data compression and decompression algorithms in C++?

How to implement data compression and decompression algorithms in C?

Abstract: Data compression and decompression is one of the most important technologies in the computer field. This article will introduce how to use C to implement data compression and decompression algorithms, and provide code examples for readers' reference.

1. Data compression algorithm

The data compression algorithm can encode a large amount of data to reduce the occupation of storage space and transmission bandwidth. In C, we can use Huffman coding and LZ77 algorithm to achieve data compression.

1.1 Huffman coding

Huffman coding is a frequency-based data compression algorithm. It assigns a shorter code to each character based on the frequency of data occurrence to achieve the purpose of compressing data.

The sample code is as follows:

#include<iostream>
#include<queue>
#include<string>
#include<unordered_map>
using namespace std;

// Huffman树的节点
struct Node {
    char ch;
    int freq;
    Node* left;
    Node* right;
};

// 用于比较树节点的优先队列
class Compare {
public:
    bool operator() (Node* a, Node* b) {
        return a->freq > b->freq;
    }
};

// 生成Huffman树
Node* generateHuffmanTree(string text) {
    // 统计每个字符出现的频率
    unordered_map<char, int> freqTable;
    for (char ch : text) {
        freqTable[ch]++;
    }

    // 将频率和字符转换为Huffman树节点
    priority_queue<Node*, vector<Node*>, Compare> pq;
    for (auto it = freqTable.begin(); it != freqTable.end(); it++) {
        Node* node = new Node();
        node->ch = it->first;
        node->freq = it->second;
        node->left = nullptr;
        node->right = nullptr;
        pq.push(node);
    }

    // 构建Huffman树
    while (pq.size() > 1) {
        Node* left = pq.top();
        pq.pop();
        Node* right = pq.top();
        pq.pop();
        Node* parent = new Node();
        parent->ch = '';
        parent->freq = left->freq + right->freq;
        parent->left = left;
        parent->right = right;
        pq.push(parent);
    }

    return pq.top();
}

// 生成Huffman编码表
void generateHuffmanCodeTable(Node* root, string code, unordered_map<char, string>& codeTable) {
    if (root == nullptr) {
        return;
    }

    if (root->ch != '') {
        codeTable[root->ch] = code;
    }

    generateHuffmanCodeTable(root->left, code + "0", codeTable);
    generateHuffmanCodeTable(root->right, code + "1", codeTable);
}

// 压缩数据
string compressData(string text, unordered_map<char, string>& codeTable) {
    string compressedData;

    for (char ch : text) {
        compressedData += codeTable[ch];
    }

    return compressedData;
}


int main() {
    string text = "Hello, World!";
    Node* root = generateHuffmanTree(text);
    unordered_map<char, string> codeTable;
    generateHuffmanCodeTable(root, "", codeTable);
    string compressedData = compressData(text, codeTable);

    cout << "Compressed Data: " << compressedData << endl;

    return 0;
}

1.2 LZ77 algorithm

The LZ77 algorithm is a dictionary-based data compression algorithm. It replaces recurring data fragments with pointers to old data to reduce the storage space of the data.

The sample code is as follows:

#include<iostream>
#include<string>
#include<vector>
using namespace std;

// 压缩数据
string compressData(string text) {
    string compressedData;
    int i = 0;

    while (i < text.length()) {
        int len = 0;
        int offset = 0;

        for (int j = 0; j < i; j++) {
            int k = 0;
            while (i + k < text.length() && text[j + k] == text[i + k]) {
                k++;
            }

            if (k > len) {
                len = k;
                offset = i - j;
            }
        }

        if (len > 0) {
            compressedData += "(" + to_string(offset) + "," + to_string(len) + ")";
            i += len;
        } else {
            compressedData += text[i];
            i++;
        }
    }

    return compressedData;
}


int main() {
    string text = "ababaabababbbb";
    string compressedData = compressData(text);

    cout << "Compressed Data: " << compressedData << endl;

    return 0;
}

2. Data decompression algorithm

The data decompression algorithm is used to restore compressed data. In C, we can use the corresponding decompression algorithm to restore the data.

2.1 Huffman decompression

The sample code is as follows:

#include<iostream>
#include<string>
#include<unordered_map>
using namespace std;

// 解压缩数据
string decompressData(string compressedData, unordered_map<string, char>& codeTable) {
    string decompressedData;
    string code;

    for (char ch : compressedData) {
        code += ch;

        if (codeTable.count(code) > 0) {
            decompressedData += codeTable[code];
            code = "";
        }
    }

    return decompressedData;
}

int main() {
    string compressedData = "010101001111011001";
    unordered_map<string, char> codeTable = { {"0", 'a'}, {"10", 'b'}, {"110", 'c'}, {"1110", 'd'}, {"1111", 'e'} };
    string decompressedData = decompressData(compressedData, codeTable);

    cout << "Decompressed Data: " << decompressedData << endl;

    return 0;
}

2.2 LZ77 decompression

The sample code is as follows:

#include<iostream>
#include<string>
#include<vector>
using namespace std;

// 解压缩数据
string decompressData(string compressedData) {
    string decompressedData;
    int i = 0;

    while (i < compressedData.length()) {
        if (compressedData[i] == '(') {
            int j = i + 1;

            while (compressedData[j] != ',') {
                j++;
            }

            int offset = stoi(compressedData.substr(i + 1, j - i - 1));
            int k = j + 1;

            while (compressedData[k] != ')') {
                k++;
            }

            int len = stoi(compressedData.substr(j + 1, k - j - 1));

            for (int l = 0; l < len; l++) {
                decompressedData += decompressedData[decompressedData.length() - offset];
            }

            i = k + 1;
        } else {
            decompressedData += compressedData[i];
            i++;
        }
    }

    return decompressedData;
}

int main() {
    string compressedData = "a(1,1)ab(3,3)b(9,2)";
    string decompressedData = decompressData(compressedData);

    cout << "Decompressed Data: " << decompressedData << endl;

    return 0;
}

Conclusion:

This article introduces how to use C to implement data compression and decompression algorithms. With Huffman coding and LZ77 algorithm, we are able to compress and decompress data efficiently. Readers can choose the algorithm that suits them according to their needs, and practice and optimize based on the sample code.

The above is the detailed content of How to implement data compression and decompression algorithms in C++?. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
C   XML Libraries: Comparing and Contrasting OptionsC XML Libraries: Comparing and Contrasting OptionsApr 22, 2025 am 12:05 AM

There are four commonly used XML libraries in C: TinyXML-2, PugiXML, Xerces-C, and RapidXML. 1.TinyXML-2 is suitable for environments with limited resources, lightweight but limited functions. 2. PugiXML is fast and supports XPath query, suitable for complex XML structures. 3.Xerces-C is powerful, supports DOM and SAX resolution, and is suitable for complex processing. 4. RapidXML focuses on performance and parses extremely fast, but does not support XPath queries.

C   and XML: Exploring the Relationship and SupportC and XML: Exploring the Relationship and SupportApr 21, 2025 am 12:02 AM

C interacts with XML through third-party libraries (such as TinyXML, Pugixml, Xerces-C). 1) Use the library to parse XML files and convert them into C-processable data structures. 2) When generating XML, convert the C data structure to XML format. 3) In practical applications, XML is often used for configuration files and data exchange to improve development efficiency.

C# vs. C  : Understanding the Key Differences and SimilaritiesC# vs. C : Understanding the Key Differences and SimilaritiesApr 20, 2025 am 12:03 AM

The main differences between C# and C are syntax, performance and application scenarios. 1) The C# syntax is more concise, supports garbage collection, and is suitable for .NET framework development. 2) C has higher performance and requires manual memory management, which is often used in system programming and game development.

C# vs. C  : History, Evolution, and Future ProspectsC# vs. C : History, Evolution, and Future ProspectsApr 19, 2025 am 12:07 AM

The history and evolution of C# and C are unique, and the future prospects are also different. 1.C was invented by BjarneStroustrup in 1983 to introduce object-oriented programming into the C language. Its evolution process includes multiple standardizations, such as C 11 introducing auto keywords and lambda expressions, C 20 introducing concepts and coroutines, and will focus on performance and system-level programming in the future. 2.C# was released by Microsoft in 2000. Combining the advantages of C and Java, its evolution focuses on simplicity and productivity. For example, C#2.0 introduced generics and C#5.0 introduced asynchronous programming, which will focus on developers' productivity and cloud computing in the future.

C# vs. C  : Learning Curves and Developer ExperienceC# vs. C : Learning Curves and Developer ExperienceApr 18, 2025 am 12:13 AM

There are significant differences in the learning curves of C# and C and developer experience. 1) The learning curve of C# is relatively flat and is suitable for rapid development and enterprise-level applications. 2) The learning curve of C is steep and is suitable for high-performance and low-level control scenarios.

C# vs. C  : Object-Oriented Programming and FeaturesC# vs. C : Object-Oriented Programming and FeaturesApr 17, 2025 am 12:02 AM

There are significant differences in how C# and C implement and features in object-oriented programming (OOP). 1) The class definition and syntax of C# are more concise and support advanced features such as LINQ. 2) C provides finer granular control, suitable for system programming and high performance needs. Both have their own advantages, and the choice should be based on the specific application scenario.

From XML to C  : Data Transformation and ManipulationFrom XML to C : Data Transformation and ManipulationApr 16, 2025 am 12:08 AM

Converting from XML to C and performing data operations can be achieved through the following steps: 1) parsing XML files using tinyxml2 library, 2) mapping data into C's data structure, 3) using C standard library such as std::vector for data operations. Through these steps, data converted from XML can be processed and manipulated efficiently.

C# vs. C  : Memory Management and Garbage CollectionC# vs. C : Memory Management and Garbage CollectionApr 15, 2025 am 12:16 AM

C# uses automatic garbage collection mechanism, while C uses manual memory management. 1. C#'s garbage collector automatically manages memory to reduce the risk of memory leakage, but may lead to performance degradation. 2.C provides flexible memory control, suitable for applications that require fine management, but should be handled with caution to avoid memory leakage.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.