Home  >  Article  >  What is the format of zip?

What is the format of zip?

WBOY
WBOYOriginal
2022-07-06 16:01:5564715browse

zip is a file format for data compression and document storage; the MIME format of zip is "application/zip". zip is a relatively simple archive format that compresses each file separately. Compressing files separately allows Retrieval of separate files does not require reading additional data, and allows the use of different algorithms for different files.

What is the format of zip?

The operating environment of this tutorial: Windows 10 system, Dell G3 computer.

What is zip format?

ZIP file format is a file format for data compression and document storage. Its original name is Deflate. The inventor is Phil Katz.

He published the information in this format in January 1989. ZIP usually uses the suffix ".zip", and its MIME format is application/zip. Currently, the ZIP format is one of several mainstream compression formats, and its competitors include the RAR format and the open source 7z format. In terms of performance comparison, RAR and 7z formats have higher compression rates than ZIP format, and 7-Zip is gradually being used in more fields because it provides free compression tools. Microsoft has built-in support for the zip format starting from the Windows ME operating system. Even if the user does not have decompression software installed on the computer, he can open and create compressed files in the zip format. OS X and popular Linux operating systems also provide support for the zip format. Similar support. Therefore, if you spread and distribute files on the Internet, the zip format is often the most commonly used choice.

Technical Introduction

ZIP is a fairly simple archive format that compresses each file individually. Compressing files separately allows independent files to be retrieved without reading additional data; in theory, this format allows the use of different algorithms for different files. Regardless of the method used, one caveat to this format is that when the archive contains many small files, the archive will be significantly more compressed than compressed into a single file (a classic example in Unix-like systems is the ordinary tar.gz archive is composed of a TAR archive compressed using gzip) to be larger.

The ZIP specification states that files can be stored without compression or using different compression algorithms. However, in practice, ZIP almost always uses Katz's DEFLATE algorithm.

ZIP supports a simple password based on a symmetric encryption system, which is known to have serious flaws, known plaintext attacks, dictionary attacks and brute force attacks. ZIP also supports volume compression.

In recent times, ZIP has added new features including new compression and encryption methods, but these new features are not supported by many tasks and have not been widely used.

Disadvantages:

Due to their early appearance on the market, today’s Zip files have many shortcomings that cannot be ignored compared with other compression formats.

Natively does not support Unicode file names, which can easily lead to difficulties in sharing some resources, especially in resource exchanges in the East Asian cultural circle; the compression ratio cannot be compared with 7z and the recovery record repair support function such as WinRAR Lack of it is also the cause of its decline.

Compression method

The following method:

Shrinking (Method 1)

Shrinking is a minor adjustment of LZW A variant, also affected by LZW patent issues. It was never clear whether this patent covered anti-shrinking, but some open source projects (such as Info-ZIP) decided to err on the side of caution and not include anti-shrinking support in the default build.

Reducing (Method 2-5)

Reduction (Reducing) involves compressing a combination of repeated byte sequences and then applying a probability-based encoding to get the result.

Imploding (Method 6)

Imploding involves using a sliding window to compress repeated byte sequences, and then using multiple Shannon-Fano trees to compress the result.

Tokenizing (Method 7)

The number of Tokenizing methods is reserved. The PKWARE specification does not define an algorithm for it.

Deflate and Enhanced Deflate (Methods 8 and 9)

These methods use the well-known Deflate algorithm. Deflate allows windows up to 32K. Enhanced Deflate allows windows up to 64K. The enhanced version was somewhat more successful in its mission, but was not widely supported.

Deflate comparison size is 52.1MiB (tested using pkzip for Windows, version 8.00.0038)

Enhanced Deflate comparison size is 52.8MiB (tested using pkzip for Windows, version 8.00.0038)

PKWARE Data Compression Library Imploding (Method 10)

PKWARE Data Compression Library Imploding (PKWARE Data Compression Library Imploding), the official ZIP format specification does not give more information on this.

Compared size is 61.6MiB (tested using pkzip for Windows, version 8.00.0038, binary mode selected)

Method 11

This method is reserved by PKWARE.

Bzip2 (Method 12)

This method uses the well-known bzip2 algorithm. This algorithm is more efficient than deflate but is not supported by tools (based on Windows platforms).

The comparison size is 50.6MiB (tested using pkzip for Windows, version 8.00.0038)

For more related knowledge, please visit the FAQ column!

The above is the detailed content of What is the format of zip?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn