Detailed explanation of Huffman encoding/decoding steps in PHP-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

Detailed explanation of Huffman encoding/decoding steps in PHP

php中世界最好的语言

May 16, 2018 pm 03:15 PM

phpstep

This time I will bring you a detailed explanation of the steps to implement Huffman encoding/decoding in PHP. What are the precautions for implementing Huffman encoding/decoding in PHP? Here are practical cases, let’s take a look.

This article will use PHP to practice Huffman encoding and decoding.

1. Encoding

Word count

The first step of Huffman encoding is to count the occurrence of each character in the document times, PHP's built-in function count_chars() can do:

$input = file_get_contents(&#39;input.txt&#39;);
$stat = count_chars($input, 1);

Construct Huffman tree

Connect Next, construct the Huffman tree based on the statistical results. Construction method is described in detail in Wikipedia. Here is a simple version written in PHP:

$huffmanTree = [];
foreach ($stat as $char => $count) {
  $huffmanTree[] = [
    &#39;k&#39; => chr($char),
    &#39;v&#39; => $count,
    &#39;left&#39; => null,
    &#39;right&#39; => null,
  ];
}
// 构造树的层级关系，思想见wiki：https://zh.wikipedia.org/wiki/%E9%9C%8D%E5%A4%AB%E6%9B%BC%E7%BC%96%E7%A0%81
$size = count($huffmanTree);
for ($i = 0; $i !== $size - 1; $i++) {
  uasort($huffmanTree, function ($a, $b) {
    if ($a[&#39;v&#39;] === $b[&#39;v&#39;]) {
      return 0;
    }
    return $a[&#39;v&#39;] < $b[&#39;v&#39;] ? -1 : 1;
  });
  $a = array_shift($huffmanTree);
  $b = array_shift($huffmanTree);
  $huffmanTree[] = [
    &#39;v&#39; => $a[&#39;v&#39;] + $b[&#39;v&#39;],
    &#39;left&#39; => $b,
    &#39;right&#39; => $a,
  ];
}
$root = current($huffmanTree);

After calculation, $root will point to the root node of the Huffman tree

Generate a coding dictionary based on the Huffman tree

With the Huffman tree, you can generate a dictionary for encoding:

function buildDict($elem, $code = &#39;&#39;, &$dict) {
  if (isset($elem[&#39;k&#39;])) {
    $dict[$elem[&#39;k&#39;]] = $code;
  } else {
    buildDict($elem[&#39;left&#39;], $code.&#39;0&#39;, $dict);
    buildDict($elem[&#39;right&#39;], $code.&#39;1&#39;, $dict);
  }
}
$dict = [];
buildDict($root, &#39;&#39;, $dict);

Write the file

Use the dictionary to encode the file content and write Import the file. There are several things to pay attention to when writing Huffman encoding to a file:

After writing the encoding dictionary and encoding content to the file together, it is impossible to distinguish their boundaries, so they need to be written at the beginning of the file. Number of bytes occupied

The fwrite() function provided by PHP can write 8-bit (one byte) or an integer multiple of 8 bits at a time. However, in Huffman encoding, a character may be represented by only 1-bit, and PHP does not support the operation of writing only 1-bit to the file. Therefore, we need to splice the encoding ourselves, and only write the file after every 8-bit is obtained.

Write every time 8-bit is obtained

Similar to the second item, the final file size must be an integer multiple of 8-bit. So if the size of the entire encoding is 8001-bit, 7 0s must be added at the end

$dictString = serialize($dict);
// 写入字典和编码各自占用的字节数
$header = pack(&#39;VV&#39;, strlen($dictString), strlen($input));
fwrite($outFile, $header);
// 写入字典本身
fwrite($outFile, $dictString);
// 写入编码的内容
$buffer = &#39;&#39;;
$i = 0;
while (isset($input[$i])) {
  $buffer .= $dict[$input[$i]];
  while (isset($buffer[7])) {
    $char = bindec(substr($buffer, 0, 8));
    fwrite($outFile, chr($char));
    $buffer = substr($buffer, 8);
  }
  $i++;
}
// 末尾的内容如果没有凑齐 8-bit，需要自行补齐
if (!empty($buffer)) {
  $char = bindec(str_pad($buffer, 8, &#39;0&#39;));
  fwrite($outFile, chr($char));
}
fclose($outFile);

Decoding

The decoding of Huffman encoding is relatively simple: read first Get the encoding dictionary, and then decode the original characters according to the dictionary.

There is a problem that needs to be noted during the decoding process: since we have added several 0-bits at the end of the file during the encoding process, if these 0-bits happen to be the encoding of a certain character in the dictionary, This will cause incorrect decoding.

So during the decoding process, when the number of decoded characters reaches the document length, decoding will stop.

<?php
$content = file_get_contents(&#39;a.out&#39;);
// 读出字典长度和编码内容长度
$header = unpack(&#39;VdictLen/VcontentLen&#39;, $content);
$dict = unserialize(substr($content, 8, $header[&#39;dictLen&#39;]));
$dict = array_flip($dict);
$bin = substr($content, 8 + $header[&#39;dictLen&#39;]);
$output = &#39;&#39;;
$key = &#39;&#39;;
$decodedLen = 0;
$i = 0;
while (isset($bin[$i]) && $decodedLen !== $header[&#39;contentLen&#39;]) {
  $bits = decbin(ord($bin[$i]));
  $bits = str_pad($bits, 8, &#39;0&#39;, STR_PAD_LEFT);
  for ($j = 0; $j !== 8; $j++) {
    // 每拼接上 1-bit，就去与字典比对是否能解码出字符
    $key .= $bits[$j];
    if (isset($dict[$key])) {
      $output .= $dict[$key];
      $key = &#39;&#39;;
      $decodedLen++;
      if ($decodedLen === $header[&#39;contentLen&#39;]) {
        break;
      }
    }
  }
  $i++;
}
echo $output;

Test

We save the HTML code of the Huffman coding Wiki page locally and conduct the Huffman coding test. The test results are:

Before encoding: 418,504 bytes

After encoding: 280,127 bytes

The space is saved by 33%. If the original text has a lot of repeated content, the space saved by Huffman encoding can reach 50% Above.

In addition to the text content, we try to Huffman encode a binary file, such as the f.lux installation program. The test results are as follows:

Before encoding: 770,384 bytes

After encoding: 773,076 bytes

After encoding, it takes up more space. On the one hand, it is because we do not do additional processing when storing the dictionary, which takes up a lot of space. Less space. On the other hand, in binary files, the probability of each character appearing is relatively even, and the advantages of Huffman coding cannot be used.

I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!

Recommended reading:

Detailed explanation of the steps for using the PHP quick sort algorithm

Detailed explanation of the iterator steps implemented by PHP based on SPL

The above is the detailed content of Detailed explanation of Huffman encoding/decoding steps in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How can you prevent session fixation attacks?Apr 28, 2025 am 12:25 AM

Effective methods to prevent session fixed attacks include: 1. Regenerate the session ID after the user logs in; 2. Use a secure session ID generation algorithm; 3. Implement the session timeout mechanism; 4. Encrypt session data using HTTPS. These measures can ensure that the application is indestructible when facing session fixed attacks.

How do you implement sessionless authentication?Apr 28, 2025 am 12:24 AM

Implementing session-free authentication can be achieved by using JSONWebTokens (JWT), a token-based authentication system where all necessary information is stored in the token without server-side session storage. 1) Use JWT to generate and verify tokens, 2) Ensure that HTTPS is used to prevent tokens from being intercepted, 3) Securely store tokens on the client side, 4) Verify tokens on the server side to prevent tampering, 5) Implement token revocation mechanisms, such as using short-term access tokens and long-term refresh tokens.

What are some common security risks associated with PHP sessions?Apr 28, 2025 am 12:24 AM

The security risks of PHP sessions mainly include session hijacking, session fixation, session prediction and session poisoning. 1. Session hijacking can be prevented by using HTTPS and protecting cookies. 2. Session fixation can be avoided by regenerating the session ID before the user logs in. 3. Session prediction needs to ensure the randomness and unpredictability of session IDs. 4. Session poisoning can be prevented by verifying and filtering session data.

How do you destroy a PHP session?Apr 28, 2025 am 12:16 AM

To destroy a PHP session, you need to start the session first, then clear the data and destroy the session file. 1. Use session_start() to start the session. 2. Use session_unset() to clear the session data. 3. Finally, use session_destroy() to destroy the session file to ensure data security and resource release.

How can you change the default session save path in PHP?Apr 28, 2025 am 12:12 AM

How to change the default session saving path of PHP? It can be achieved through the following steps: use session_save_path('/var/www/sessions');session_start(); in PHP scripts to set the session saving path. Set session.save_path="/var/www/sessions" in the php.ini file to change the session saving path globally. Use Memcached or Redis to store session data, such as ini_set('session.save_handler','memcached'); ini_set(

How do you modify data stored in a PHP session?Apr 27, 2025 am 12:23 AM

TomodifydatainaPHPsession,startthesessionwithsession_start(),thenuse$_SESSIONtoset,modify,orremovevariables.1)Startthesession.2)Setormodifysessionvariablesusing$_SESSION.3)Removevariableswithunset().4)Clearallvariableswithsession_unset().5)Destroythe

Give an example of storing an array in a PHP session.Apr 27, 2025 am 12:20 AM

Arrays can be stored in PHP sessions. 1. Start the session and use session_start(). 2. Create an array and store it in $_SESSION. 3. Retrieve the array through $_SESSION. 4. Optimize session data to improve performance.

How does garbage collection work for PHP sessions?Apr 27, 2025 am 12:19 AM

PHP session garbage collection is triggered through a probability mechanism to clean up expired session data. 1) Set the trigger probability and session life cycle in the configuration file; 2) You can use cron tasks to optimize high-load applications; 3) You need to balance the garbage collection frequency and performance to avoid data loss.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1

Powerful PHP integrated development environment

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),