What is the difference between text files and binary files?-Common Problem-php.cn

Home

Common Problem

What is the difference between text files and binary files?

coldplay.xixi

Nov 20, 2020 am 09:42 AM

binary filetext file

The difference between text files and binary files: 1. Text files are files based on character encoding. Common encodings include ASCII encoding, UNICODE encoding, etc.; 2. Binary files are files based on value encoding

What is the difference between text files and binary files?

The difference between text files and binary files:

1. The definition of text files and binary files

Everyone knows Computer storage is physically binary, so the difference between text files and binary files is not physical, but logical. The two only differ at the coding level.

Simply put, text files are files based on character encoding. Common encodings include ASCII encoding, UNICODE encoding, etc. Binary files are files based on value encoding. You can specify what a certain value means according to the specific application (such a process can be regarded as custom encoding).

It can be seen from the above that text files are basically fixed-length encoding. Based on characters, each character is fixed in the specific encoding. ASCII code is an 8-bit encoding, and UNICODE generally accounts for 16 bits. bits. Binary files can be regarded as variable-length encoding, because it is value encoding. How many bits represent a value is entirely up to you. You may be familiar with BMP files. Let’s take it as an example. Its header is relatively fixed-length file header information. The first 2 bytes are used to record that the file is in BMP format, and the next 8 bytes are used to record the file. length, and the next 4 bytes are used to record the length of the bmp file header. . . As you can see, the encoding is based on values (variable lengths, including values of 2, 4, and 8 bytes long), so BMP is a binary file.

2. Access to text files and binary files

What is the process of opening a file with a text tool? Take Notepad as an example. It first reads the binary bit stream that physically corresponds to the file (as mentioned earlier, storage is binary), then interprets this stream according to the decoding method you choose, and then displays the interpretation results. . Generally speaking, the decoding method you choose will be in ASCII code form (one character of ASCII code is 8 bits). Next, it interprets this file stream 8 bits 8 bits. For example, for such a file stream "01000000_01000001_01000010_01000011" (underscore '_', which I added manually to enhance readability), if the first 8 bits '01000000' is decoded according to ASCII code, the corresponding character is 'A', similarly the other three 8-bits can be decoded as 'BCD' respectively, that is, this file stream can be interpreted as "ABCD", and then Notepad will display this "ABCD" on the screen.

In fact, if anything in the world wants to communicate with other things, there is an established protocol and established encoding. People communicate with each other through words. The Chinese character "mother" represents the person who gave birth to you. This is an established code. But I noticed that the Chinese character "Mom" in Japanese characters may mean the person you gave birth to. Therefore, when a Chinese person A and a Japanese person B use the word "mother" to communicate, it is very easy for misunderstandings to occur. normal. Opening binary files with Notepad is similar to the situation above. No matter what file it opens, Notepad works according to the established character encoding (such as ASCII code), so when it opens a binary file, it is inevitable that garbled characters will appear. Decoding and decoding do not correspond. For example, the file stream '00000000_00000000_00000000_00000001' may correspond to a four-byte integer int1 in the binary file. When interpreted in Notepad, it becomes the four control characters "NULL_NULL_NULL_SOH".

The storage and reading of text files are basically a reverse process, which will not be described again. The access of binary files is obviously similar to the access of text files, except that the encoding/decoding methods are different, which will not be described again.

3. Advantages and Disadvantages of Text Files and Binary Files

Because the difference between text files and binary files is only in encoding, their advantages and disadvantages are in encoding The advantages and disadvantages will be clearer if you look for a coding book. It is generally believed that text file encoding is based on fixed-length characters and is easier to decode; binary file encoding is variable-length, so it is flexible, has higher storage utilization, and is more difficult to decode (different binary file formats have different decoding methods). code method). Regarding space utilization, think about it, binary files can even use one bit to represent a meaning (bit operation), while any meaning in a text file is at least one character.

Many books also believe that text files are more readable and storage requires conversion time (reading and writing require encoding and decoding), while binary files are less readable and storage does not require conversion time (reading and writing do not require encoding and decoding. Write the value directly). The readability here is from the perspective of software users, because we can browse almost all text files using the general Notepad tool, so text files are said to be readable; while reading and writing a specific binary file requires a Specific file decoder, so the readability of binary files is poor. For example, to read BMP files, you must use image reading software. The storage conversion time here should be from a programming perspective, because some operating systems such as Windows need to convert carriage returns and line feeds (replace '\n' with '\r\n', so file reading and writing When running, the operating system needs to check character by character whether the current character is '\n' or '\r\n'). This storage conversion is not needed in the Linux operating system, of course, when running on two different operating systems This storage conversion may occur again when sharing files (such as Linux systems and Windows systems sharing text files). Regarding how to perform this conversion, I will give it in the next article "Conversion between Linux Text Files and Windows Text Files" ^_^

4. C text reading and writing and binary Reading and writing

It should be said that C text reading and writing and binary reading and writing are a programming level issue, related to the specific operating system, so "files read and written in text mode must be text files. Use The view that files read and written by binary must be binary files is wrong. The following description does not explicitly indicate the operating system type, but all refers to windows. The difference between C's textual reading and writing and binary reading and writing is only reflected in the processing of carriage returns and line feeds. When writing in text mode, every time it encounters a '\n' (0AH newline character), it will replace it with '\r\n' (0D0AH, carriage return and newline character), and then write it to the file; when reading text, Every time it encounters a '\r\n', it changes it to '\n' and then sends it to the read buffer. Just because the text mode has conversion between '\n'--'\r\n', the conversion is time-consuming. When reading and writing binary, there is no conversion, and the data in the write buffer is directly written to the file.

Generally speaking, from a programming perspective, text or binary reading and writing in C are interactions between the buffer and the binary stream in the file, except that there is a carriage return and line feed conversion when reading and writing text. Therefore, when there is no newline character '\n' (0AH) in the write buffer, the result of text writing and binary writing are the same. Similarly, when there is no '\r\n' (0DH0AH) in the file, the result of text reading is the same as that of binary writing. The result of binary reading is the same.

The above is the detailed content of What is the difference between text files and binary files?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

PHP中的二进制文件读写操作Jun 22, 2023 am 09:09 AM

PHP是一种广泛应用于Web开发的语言，它提供了许多用于处理文件的函数及方法。在PHP中，我们可以使用二进制模式来读写文件，这种方式可以提高文件操作的效率，特别是在处理二进制文件时。在本文中，我们将探讨PHP中的二进制文件读写操作，以及该如何使用这种方式来处理二进制文件。什么是二进制文件？二进制文件是指由纯二进制表示的文件，其内容有可能包含不同编码的字符集，

如何使用C++读写二进制文件？Jun 01, 2024 pm 09:21 PM

在C++中读写二进制文件的方法：写入二进制文件：使用std::ofstream类，设定输出模式为std::ios::binary。读取二进制文件：使用std::ifstream类，设定输入模式为std::ios::binary。

html怎么读取文本文件Mar 26, 2024 pm 04:07 PM

HTML 本身无法直接读取文本文件，但可以通过后端编程语言（如 PHP、Python、Java）或前端 JavaScript 技术来实现此功能。后端方法使用 PHP 的 file_get_contents() 函数从文本文件中读取内容，并将其嵌入到 HTML 页面中。前端 JavaScript 方法使用 Fetch API 发送 GET 请求到服务器上的文本文件，然后解析响应内容并将其显示在 HTML 页面中。

用 HTML 读取文本文件的最佳实践Apr 09, 2024 pm 03:45 PM

使用元素并利用FileReaderAPI可以通过HTML读取文本文件。最佳实践包括使用accept属性过滤文件类型，利用multiple属性选择多个文件，以及通过onchange事件处理程序读取文件。一个实战案例演示了如何读取文本文件并显示其内容，利用FileReader的readAsText()方法将文件内容加载到一个变量中。

文本文件的扩展名是什么Aug 22, 2022 pm 01:59 PM

文本文件的扩展名是“txt”。文本文件是以TXT后缀名的文件，包含了极少格式信息。“.txt”格式并没有明确的定义，它通常是指那些能够被系统终端或者简单的文本编辑器接受的格式；任何能读取文字的程序都能读取带有“.txt”扩展名的文件，因此，通常认为这种文件是通用的、跨平台的。

如何利用GitLab进行二进制文件管理和存档Oct 21, 2023 am 10:22 AM

如何利用GitLab进行二进制文件管理和存档GitLab是一种开源的版本控制系统，它使用Git作为版本控制工具，并提供了一个可视化的Web界面。许多人使用GitLab来管理和存档源代码，但是对于二进制文件的管理和存档，一些人可能会感到困惑。本文将介绍如何在GitLab中有效地管理和存档二进制文件，并提供一些具体的代码示例。创建一个新的GitLab项目首先，在

在C语言中，文本文件和二进制文件是什么？Sep 08, 2023 pm 04:37 PM

文件是记录的集合（或者）是硬盘上永久存储数据的地方。文件类型C中有两种类型的文件语言如下-文本文件二进制文件文本文件它包含人类容易理解的字母和数字。文本文件中的错误可以在以下情况下消除：在文本文件中，文本和字符每字节存储一个字符。例如整数值4567将在内存中占用2个字节，但在文本文件中将占用5个字节。数据格式通常是面向行的。这里，每一行都是一个单独的命令。二进制文件它包含1和0，计算机很容易理解。二进制文件中的错误会损坏文件并且不易检测。在二进制文件中，整数值1245将在内存和文件中占用2个字节