search
HomeBackend DevelopmentPHP Tutorial用PHP实现POP3邮件的解码(二)

MIME 编码方式简介

Subject: =?gb2312?B?xOO6w6Oh?=

这里是邮件的主题,可是因为编码了,我们看不出是什么内容,其原来的文本是:“你好!”我们先看看 MIME 编码的两种方法。

对邮件进行编码最初的原因是因为 Internet 上的很多网关不能正确传输8 bit 内码的字符,比如汉字等。编码的原理就是把 8 bit 的内容转换成 7 bit 的形式以能正确传输,在接收方收到之后,再将其还原成 8 bit 的内容。

MIME 是“多用途网际邮件扩充协议”的缩写,在 MIME 协议之前,邮件的编码曾经有过 UUENCODE 等编码方式 ,但是由于 MIME 协议算法简单,并且易于扩展,现在已经成为邮件编码方式的主流,不仅是用来传输 8 bit 的字符,也可以用来传送二进制的文件 ,如邮件附件中的图像、音频等信息,而且扩展了很多基于MIME 的应用。从编码方式来说,MIME 定义了两种编码方法Base64与QP(Quote-PRintable) :

Base 64 是一种通用的方法,其原理很简单,就是把三个Byte的数据用 4 个Byte表示,这样,这四个Byte 中,实际用到的都只有前面6 bit,这样就不存在只能传输 7bit 的字符的问题了。Base 64的缩写一般是“B”,像这封信中的Subject 就是用的 Base64 编码。

另一种方法是QP(Quote-Printable) 方法,通常缩写为“Q”方法,其原理是把一个 8 bit 的字符用两个16进制数值表示,然后在前面加“=”。所以我们看到经过QP编码后的文件通常是这个样子:=B3=C2=BF=A1=C7=E5=A3=AC=C4=FA=BA=C3=A3=A1。

在 php 里,系统有两个函数可以很方便地实现解码:base64_decode()与quoted_printable_decode(),前者可用于base64 编码的解码,后者是用于 QP 编码方法的解码。

现在我们再来看看Subject: =?gb2312?B?xOO6w6Oh?= 这一主题的内容,这不是一段完整的编码,只有部分是编码了的,这个部分用 =? ?= 两个标记括起来,=? 后面说明的是这段文字的字符集是 GB2312 ,然后一个 ? 后面的一个 B 表示的是用的 Base64 编码。通过这段分析,我们来看一下这个 MIME 解码的函数:(该函数由 PHPX.COM 站长 Sadly 提供,本人将其放入一个类中,并做了少量的修改,在此致谢)

function decode_mime($string) {

$pos = strpos($string, '=?');

if (!is_int($pos)) {

   return $string;

}

$preceding = substr($string, 0, $pos); // save any preceding text

$search = substr($string, $pos+2); /* the mime header spec says this is the longest a single encoded Word can be */

$d1 = strpos($search, '?');

if (!is_int($d1)) {

   return $string;

}

$charset = substr($string, $pos+2, $d1); //取出字符集的定义部分

$search = substr($search, $d1+1); //字符集定义以后的部分=>$search;

$d2 = strpos($search, '?');

if (!is_int($d2)) {

   return $string;

}

$encoding = substr($search, 0, $d2); ////两个? 之间的部分编码方式 :q 或 b 

$search = substr($search, $d2+1);

$end = strpos($search, '?='); //$d2+1 与 $end 之间是编码了 的内容:=> $endcoded_text;

if (!is_int($end)) {

   return $string;

}

$encoded_text = substr($search, 0, $end);

$rest = substr($string, (strlen($preceding . $charset . $encoding . $encoded_text)+6)); //+6 是前面去掉的 =????= 六个字符

switch ($encoding) {

case 'Q':

case 'q':

   //$encoded_text = str_replace('_', '%20', $encoded_text);

   //$encoded_text = str_replace('=', '%', $encoded_text);

   //$decoded = urldecode($encoded_text);

$decoded=quoted_printable_decode($encoded_text);

   if (strtolower($charset) == 'windows-1251') {

   $decoded = convert_cyr_string($decoded, 'w', 'k');

   }

   break;

case 'B':

case 'b':

   $decoded = base64_decode($encoded_text);

   if (strtolower($charset) == 'windows-1251') {

   $decoded = convert_cyr_string($decoded, 'w', 'k');

   }

   break;

default:

   $decoded = '=?' . $charset . '?' . $encoding . '?' . $encoded_text . '?=';

   break;

}

return $preceding . $decoded . $this->decode_mime($rest);

}

这个函数用了递归的方法来实现一段包含有如上的 Subject 段的字符的解码。程序中已经加上了注释。相信有点PHP 编程基础的人都能够看得明白。该函数也是调用的base64_decode()与quoted_printable_decode()两个系统函数实现的解码,但是需要对邮件源文件进行大量的字符串的分析。不过,PHP 的字符串操作可以算是所有语言里最为方便自由的。函数的最后return $preceding . $decoded . $this->decode_mime($rest); 实现递归解码,因为这个函数实际上是放在后面要介绍的一个 MIME解码的类中的,所以用了 $this->decode_mime($rest)这种形式的调用方法。

下面我们来看正文。这里关系到 MIME 的一些头信息,我们先做一个简单的介绍(如果读者有兴趣了解更多的内容,请参考 MIME 的官方文档)。

MIME-Version: 1.0

表示使用的 MIME 的版本号,一般是1.0;

Content-Type: 定义了正文的类型,我们实际上是通过这个标识来知道正文内是什么类型的文件,比如:

text/plain 表示的是无格式的文本正文,
text/html 表示的 Html 文档,
image/gif 表示的是 gif 格式的图片等等。

在本文中特别要说明一下的是邮件中常用到的复合类型。multipart 类型表示正文是由多个部分组成的,后面的子类型说明的是这些部分之间的关系,邮件中用到的三个类型有,

multipart/alternative:表示正文由两个部分组成,可以选择其中的任意一个。主要作用是在征文同时有 text 格式和 html 格式时,可以在两个正文中选择一个来显示,支持 html 格式的邮件客户端软件一般会显示其 HTML 正文,而不支持的则会显示其 Text 正文;

multipart/mixed :表示文档的多个部分是混合的,指正文与附件的关系。如果邮件的 MIME 类型是

multipart/mixed,即表示邮件带有附件;
multipart/related :表示文档的多个部分是相关的,一般用来描述 Html 正文与其相关的图片。

这些复合类型又是可以嵌套使用的,比如说一个带有附件的邮件,同时有 html 与 text 两种格式的正文,则邮件的结构是:

Content-Type: multipart/mixed

部分一:

Content Type : multipart/alternative:

Text 正文;

Html 格式的正文 

部分二:

附件

邮件结束符;

由于复合类型由多个部分组成,因此,需要一个分隔符来分隔这多个部分,这就是上面的邮件源文件中的boundary="----=_NextPart_000_0007_01C03166.5B1E9510"所描述的,对于每一个Contect type :multipart/* 的内容,都会有这么一个说明,表示多个部分之间的分隔,这个分隔符是正文中不可能出现的一串古字符的组合,在文档中,以 "--" 加上这个boundary 来表示一个部分的开始,在文档的结束,以"--"加boundary再在最后加上 "--" 来表示文档的结束。由于复合类型是可以嵌套使用的,因此,邮件中可能会多个 boundary 。

还有一个最重要的 MIME 头标签:

Content-Transfer-Encoding: base64 它表示了这个部分文档的编码方式,也就是我们上面所介绍的Base64或QP(Quote-Printable)。我们只有识别了这个说明,才能用正确的解码方式实现对其解码。

  限于篇幅,对于 MIME 的介绍就只说到这里。下面我将给出一个解码MIME邮件的类,并对其做简要说明。

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
PHP's Current Status: A Look at Web Development TrendsPHP's Current Status: A Look at Web Development TrendsApr 13, 2025 am 12:20 AM

PHP remains important in modern web development, especially in content management and e-commerce platforms. 1) PHP has a rich ecosystem and strong framework support, such as Laravel and Symfony. 2) Performance optimization can be achieved through OPcache and Nginx. 3) PHP8.0 introduces JIT compiler to improve performance. 4) Cloud-native applications are deployed through Docker and Kubernetes to improve flexibility and scalability.

PHP vs. Other Languages: A ComparisonPHP vs. Other Languages: A ComparisonApr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP vs. Python: Core Features and FunctionalityPHP vs. Python: Core Features and FunctionalityApr 13, 2025 am 12:16 AM

PHP and Python each have their own advantages and are suitable for different scenarios. 1.PHP is suitable for web development and provides built-in web servers and rich function libraries. 2. Python is suitable for data science and machine learning, with concise syntax and a powerful standard library. When choosing, it should be decided based on project requirements.

PHP: A Key Language for Web DevelopmentPHP: A Key Language for Web DevelopmentApr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP: The Foundation of Many WebsitesPHP: The Foundation of Many WebsitesApr 13, 2025 am 12:07 AM

The reasons why PHP is the preferred technology stack for many websites include its ease of use, strong community support, and widespread use. 1) Easy to learn and use, suitable for beginners. 2) Have a huge developer community and rich resources. 3) Widely used in WordPress, Drupal and other platforms. 4) Integrate tightly with web servers to simplify development deployment.

Beyond the Hype: Assessing PHP's Role TodayBeyond the Hype: Assessing PHP's Role TodayApr 12, 2025 am 12:17 AM

PHP remains a powerful and widely used tool in modern programming, especially in the field of web development. 1) PHP is easy to use and seamlessly integrated with databases, and is the first choice for many developers. 2) It supports dynamic content generation and object-oriented programming, suitable for quickly creating and maintaining websites. 3) PHP's performance can be improved by caching and optimizing database queries, and its extensive community and rich ecosystem make it still important in today's technology stack.

What are Weak References in PHP and when are they useful?What are Weak References in PHP and when are they useful?Apr 12, 2025 am 12:13 AM

In PHP, weak references are implemented through the WeakReference class and will not prevent the garbage collector from reclaiming objects. Weak references are suitable for scenarios such as caching systems and event listeners. It should be noted that it cannot guarantee the survival of objects and that garbage collection may be delayed.

Explain the __invoke magic method in PHP.Explain the __invoke magic method in PHP.Apr 12, 2025 am 12:07 AM

The \_\_invoke method allows objects to be called like functions. 1. Define the \_\_invoke method so that the object can be called. 2. When using the $obj(...) syntax, PHP will execute the \_\_invoke method. 3. Suitable for scenarios such as logging and calculator, improving code flexibility and readability.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.