BackgroundHHVM is a high-performance PHP virtual machine developed by Facebook. It is claimed to be 9 times faster than the official one. I was very curious, so I took the time to briefly learn about it and compiled this article. I hope it can answer two questions:
What would you do?Before discussing the implementation principles of HHVM, let’s put ourselves in your shoes: Suppose you have a website written in PHP that encounters performance problems. After analysis, you find that a large part of the resources are consumed in PHP. How would you optimize PHP performance? For example, there are several ways:
Option 1 is almost unfeasible. Ten years ago, Joel warned with the example of Netscape that you will give up years of experience accumulation, especially for products with complex business logic like Facebook. There are too many PHP codes. According to the It is said to have 20 million lines (quoted from [PHP on the Metal with HHVM]). The cost of modification is probably greater than writing a virtual machine, and for a team of thousands of people, learning from scratch is unacceptable. Option 2 is the safest solution and can be migrated gradually. In fact, Facebook is also working hard in this regard and has also developed RPC solutions such as Thrift. Another language mainly used within Facebook is C++. From the early days You can see this in the Thrift code, because the implementations in other languages are very crude and cannot be used in a production environment. Currently in Facebook, it is said that PHP:C++ has increased from 9:1 to 7:3. Coupled with the presence of Andrei Alexandrescu, C++ is becoming more and more popular in Facebook, but this can only solve part of the problem. After all, the cost of C++ development Much higher than PHP, it is not suitable for use in places that are frequently modified, and too many RPC calls will seriously affect performance. Option 3 looks good, but is difficult to implement in practice. Generally speaking, the performance bottleneck is not very significant, and is mostly the result of continuous accumulation. In addition, the cost of PHP extension development is high. This solution is generally only used in public applications. And it is based on a basic library that has not changed much, so this solution cannot solve many problems. It can be seen that the first three solutions cannot solve the problem well, so Facebook actually has no choice but to consider the optimization of PHP itself. Faster PHPSince we want to optimize PHP, how to optimize it? In my opinion, there are several methods:
PHP language-level optimization is the simplest and feasible. Of course Facebook has thought of it, and has also developed performance analysis tools like XHProf, which is very helpful in locating performance bottlenecks. However, XHProf still failed to solve Facebook's problem well, so we continue to look at it. Next is option 2. Simply put, the execution process of Zend can be divided into two parts: compiling PHP into opcode and executing opcode, so optimizing Zend It can be considered from these two aspects. Optimizing opcode is a common practice, which can avoid repeated parsing of PHP, and can also do some static compilation optimization, such as Zend Optimizer Plus. However, due to the dynamic nature of the PHP language, this optimization method is limited and optimistic. It is estimated that it can only improve performance by 20%. Another consideration is to optimize the opcode architecture itself, such as a register-based approach, but this approach requires too much work to modify, and the performance improvement will not be particularly obvious (maybe 30%?), so the input-output ratio is not high. Another method is to optimize the execution of opcode. First, let’s briefly mention how Zend executes it. After Zend’s interpreter (also called interpreter) reads the opcode, it will call different functions according to different opcodes (actually some are switches, but for I have simplified the description for convenience), and then perform various language-related operations in this function (if you are interested, you can read the book "In-depth Understanding of the PHP Core"), so there are no complex encapsulation and indirect calls in Zend, as an explanation It's already done very well for the device. If you want to improve the execution performance of Zend, you need to understand the underlying execution of the program. For example, function calls actually have overhead, so they can be optimized through Inline threading. Its principle is like the inline keyword in C language That way, but it expands related functions at runtime and then executes them in sequence (just an analogy, the actual implementation is different), and it also avoids the waste caused by CPU pipeline prediction failure. In addition, you can also use assembly like JavaScriptCore and LuaJIT to implement the interpreter. For specific details, it is recommended to read Mike’s explanation But these two methods are too expensive to modify, and are even more difficult than rewriting one, especially to ensure backward compatibility, as you will know when I mention the characteristics of PHP later. Developing a high-performance virtual machine is not a simple matter. It took more than 10 years for the JVM to reach its current performance. So can these high-performance virtual machines be directly used to optimize the performance of PHP? This is the idea of Option 3. In fact, this solution has been tried by people for a long time, such as Quercus and IBM's P8. Quercus has hardly been used by anyone, and P8 is also dead. Facebook has also investigated this method, and there have even been unreliable rumors, but in fact Facebook gave up in 2011. Because option 3 looks good, but the actual effect is not ideal. According to many experts (such as Mike), VM is always optimized for a certain language, and other languages will encounter many bottlenecks when implementing it, such as dynamic Method calling has been introduced in Dart's documentation, and it is said that the performance of Quercus is not much different from Zend+APC ([from The HipHop Compiler for PHP]), so it doesn't make much sense. However, OpenJDK has also been working hard in recent years. The recent Grall project looks pretty good, and there are also languages that have achieved significant results on it, but I haven’t had time to study Grall yet, so I can’t judge here. The next step is option 4, which is exactly what HPHPc (the predecessor of HHVM) does. The principle is to convert the PHP code into C++ and then compile it into a local file. It can be considered an AOT (ahead of time) method. About it For technical details of code conversion, please refer to the paper The HipHop Compiler for PHP. The following is a screenshot from the paper, which can be used to get an overview: ![]() The biggest advantage of this approach is that it is simple to implement (compared to a VM), and it can do a lot of compilation optimization (because it is offline, it is okay if it is slower), for example, the above example will In addition to HPHPc, there are two similar projects, one is Roadsend and the other is phc. phc’s approach is to convert PHP into C and then compile it. The following is an example of converting <div class="blockcode">
<div id="code_LjU"><ol>
<li>static php_fcall_info fgc_info;</li>
<li>php_fcall_info_init ("file_get_contents", &fgc_info);</li>
<li>php_hash_find (LOCAL_ST, "f", 5863275, &fgc_info.params);</li>
<li>php_call_function (&fgc_info) ;</li>
</ol></div>
<em onclick="copycode($('code_LjU'));">Copy code</em>
</div> Speaking of phc, the author once cried on the blog, saying that he went to Facebook to demonstrate phc two years ago and communicated with the engineers there. As a result, it became popular as soon as it was released, but he has been busy for 4 years but is unknown. Now The future is bleak. . . Roadsend is no longer maintained. For dynamic languages like PHP, this approach has many limitations. Since it cannot be included dynamically, Facebook compiled all the files together. The file deployment when going online actually reached 1G. It's becoming increasingly unacceptable. There is also a project called PHP QB. I didn’t look at it due to time constraints. I think it might be something similar. So there is only one way left, which is to write a faster PHP virtual machine and take this dark road to the end. Maybe you are like me. When you first heard that Facebook was going to build a virtual machine, you thought it was too outrageous, but If you analyze it carefully, you will find that this is actually the only way. Faster virtual machinesWhy is HHVM faster? The key technology of JIT has been mentioned in various news reports, but in fact it is far from that simple. JIT is not a magic wand that can improve performance with just a wave of it, and the operation of JIT itself is also time-consuming. , for simple programs, it may be slower than the interpreter. The most extreme example is that the interpreter of LuaJIT 2 is slightly faster than the JIT of V8, so there is no absolute thing. It is more about the handling of details. The development history of HHVM It is a history of continuous optimization. You can see from the picture below how it surpasses HPHPc little by little: ![]() It is worth mentioning that the new virtual machine ART in Android 4.4 uses the AOT solution (remember? The HPHPc mentioned earlier is this), and the result is twice as fast as the previous Dalvik that used JIT, so JIT is not necessarily faster than AOT. Therefore, this project is very risky. Without a strong heart and perseverance, it is very likely to be abandoned halfway. Google once wanted to use JIT to improve the performance of Python, but it ultimately failed. For Google, the use of Python is actually There are no performance issues (well, Google used to write crawl in Python [see In The Plex], but that was all in 1996). Compared to Google, Facebook obviously has greater motivation and determination. PHP is Facebook’s most important language. Let’s take a look at which experts Facebook has invested in this project (not complete):
Although there are no top experts in the field of virtual machines like Lars Bak and Mike Pall, if these experts can work together and write a virtual machine, it will not be a big problem. So what challenges will they face? Next we discuss them one by one. What are the specifications?The first problem you have to face when writing your own PHP virtual machine is that PHP has no language specification, and the syntax between many versions is incompatible (even small version numbers, such as 5.2.1 and 5.2.3). What is the PHP language specification? What about the definition? Let’s take a look at a statement from IEEE:
So the only way is to honestly look at the implementation of Zend. Fortunately, it has been painfully done once in HPHPc, so HHVM can directly use it, so this problem is not too big. Language or extension?Implementing the PHP language is not just as simple as implementing a virtual machine. The PHP language itself also includes various extensions. These extensions are integrated with the language. Zend works tirelessly to implement various functions that you may use. If you analyze the PHP code, you will find that its C code has 800+ thousand lines after excluding the blank line comments. And guess how many Zend engine parts there are? There are just under 100,000 rows. This is not a bad thing for developers, but it is very tragic for engine implementers. We can compare it with Java. To write a Java virtual machine, you only need to implement bytecode interpretation and some basic JNI calls. Most of Java's built-in libraries are implemented in Java, so if performance optimization is not considered, it is much more difficult to implement a PHP virtual machine than a JVM in terms of workload. For example, someone used 8,000 lines of TypeScript to implement a JVM. Doppio. For this problem, HHVM’s solution is very simple, that is, only implement what is used in Facebook, and you can also use what has been written before in HPHPc, so the problem is not big. Implement InterpreterThe next step is the implementation of Interpreter. After parsing PHP, a Bytecode designed by HHVM will be generated, which is stored in The main body of Interpreter is implemented in bytecode.cpp. For methods such as <code class="c++"><div class="blockcode">
<div id="code_oM7"><ol>
<li>if (c2.m_type == KindOfInt64) return o(c1.m_data.num, c2.m_data.num);</li>
<li>if (c2.m_type == KindOfDouble) return o(c1.m_data.num, c2.m_data.dbl);</li>
</ol></div>
<em onclick="copycode($('code_oM7'));">复制代码</em>
</div> 正是因为有了 Interpreter,HHVM 在对于 PHP 语法的支持上比 HPHPc 有明显改进,理论上做到完全兼容官方 PHP,但仅这么做在性能并不会比 Zend 好多少,由于无法确定变量类型,所以需要加上类似上面的条件判断语句,但这样的代码不利于现代 CPU 的执行优化,另一个问题是数据都是 boxed 的,每次读取都需要通过类似 if (c2.m_type == KindOfInt64) return o(c1.m_data.num, c2.m_data.num); if (c2.m_type == KindOfDouble) return o(c1.m_data.num, c2.m_data.dbl); Copy code
m_data.num and m_data.dbl method to obtain indirectly. Someone experimented with LLVM in 2008, and the result was 21 times slower than the original. . . In 2010, IBM Japan Research Institute developed P9 based on their JVM virtual machine code. Its performance is 2.5 to 9.5 times that of official PHP. You can read their paper Evaluation of a just-in-time compiler retrofitted for PHP.
<div class="blockcode">In 2011, Andrei Homescu developed it based on RPython and wrote a paper HappyJIT: a tracing JIT compiler for PHP, but the test results were mixed and not ideal. <div id="code_JSG">
<ol>
<li>So what exactly is JIT? How to implement a JIT? </li>
<li>
</li>
<li>In dynamic languages, there is basically an eval method, which can be passed a string for execution. JIT does a similar thing, except that it needs to splice not strings, but machine codes on different platforms, and then to execute, but how to implement it in C? You can refer to this introductory example written by Eli. Here is a piece of code from the article: </li>
<li>
</li>
<li>
<li>
</ol>
</div>
<em onclick="copycode($('code_JSG'));">unsigned char code[] = {</em> 0x48, 0x89, 0xf8, // mov %rdi, %rax</div> 0x48, 0x83, 0xc0, 0x04, // add $4, %rax🎜 0xc3 // ret🎜} ;🎜memcpy(m, code, sizeof(code));🎜🎜🎜Copy code🎜🎜 However, it is easy to make mistakes when writing machine code by hand, so the best is to have an auxiliary library, such as Mozilla's Nanojit and LuaJIT's DynASM, but HHVM does not use these, but implements one that only supports x64 (in addition Still trying to use VIXL to generate ARM 64-bit) and make the code executable through mprotect. But why is JIT code faster? You can think about it. In fact, the code written in C++ is eventually compiled into machine code. If the same code is just manually converted into machine code, what is the difference between it and what is generated by GCC? Although we mentioned some optimization techniques based on CPU implementation principles earlier, the more important optimization in JIT is to generate specific instructions based on types, thereby greatly reducing the number of instructions and conditional judgments. The following picture from TraceMonkey shows this A very intuitive comparison was made. We will see specific examples in HHVM later: ![]() HHVM is first executed through the interpeter, then when will it use JIT? There are 2 common JIT trigger conditions:
As to which of the two methods is better, there is a post on Lambada that has attracted discussions from various experts, especially Mike Pall (LuaJIT author), Andreas Gal (Mozilla VP) and Brendan Eich (Mozilla CTO). I have a lot of my own opinions, and I recommend everyone to watch them, so I won’t show off here. The difference between them is not only the compilation scope, but also many details, such as the handling of local variables, which will not be discussed here But HHVM did not use these two methods. Instead, it created its own method called tracelet, which is divided according to type. See the picture below ![]() You can see that it divides a function into 3 parts. The upper 2 parts are used to handle two different situations where Of course, various attempts and optimizations are needed to achieve high-performance JIT. For example, initially the newly added tracelet of HHVM will be placed in the front, that is, the positions of A and C in the above picture will be swapped. Later, I tried to put it in the back. As a result, the performance was improved by 14%, because the test found that it is easier to hit the response type in advance The execution process of JIT is to first convert HHBC to SSA (hhbc-translator.cpp), then optimize SSA (such as Copy propagation), and regenerate it into local machine code. For example, under X64, it is implemented by translator-x64.cpp of. Let’s use a simple example to see what the machine code finally generated by HHVM is like, such as the following PHP function: <div class="blockcode">
<div id="code_B9S"><ol>
<li>
<?php <li>function a($b){</li>
<li> echo $b + 2;</li>
<li>}</li>
</ol></div>
<em onclick="copycode($('code_B9S'));">复制代码</em>
</div>
<p></p>
<div id="code_B9S"><code class="nasm language-nasm" data-lang="nasm"><?php <div class="blockcode">function a($b){<div id="code_ZLy"> echo $b + 2;<ol>} <li>
<li><em onclick="copycode($('code_B9S'));">Copy the code<li>
<li>
<li> </li>
<li>This is what it looks like after compilation:</li>
<li>
</li>
<li>
<li>
<li>
<li>
<li>mov rcx,0x7200000</li>
<li>mov rdi,rbp</li>
<li>mov rsi,rbx</li>
<li>mov rdx,0x20</li>
<li>call 0x2651dfb <:transl::tracecallback long void><li>cmp BYTE PTR [rbp-0x8],0xa</li>
<li>jne 0xae00306</li>
<li>; The previous step is to check whether the parameters are valid</li>
<li>
<li>mov rcx,QWORD PTR [rbp-0x10]; Here %rcx is assigned a value of 1 </li>
<li>mov edi,0x2 ; Assign %edi (that is, the lower 32 bits of %rdi) to 2 </li>
<li>add rdi,rcx ; Add %rcx </li>
<li>call 0x2131f1b <:print_int> ; Call the print_int function , at this time the value of the first parameter %rdi is already 3<li>
<li>; We will not discuss it later</li></:print_int>
</li></:transl::tracecallback>
</li></em></li>
</ol>mov BYTE PTR [rbp+0x28],0x8</div>lea rbx,[rbp+0x20]<em onclick="copycode($('code_ZLy'));">test BYTE PTR [r12], 0xffnjne 0xae0032A</em>Push QWORD PTR [RBP+0x8] And the implementation of HPHP::print_int function is like this: <div class="blockcode"><div id="code_K6f"><ol>
<code class="c++ language-c++" data-lang="c++"><div class="blockcode">
<div id="code_K6f"><ol>
<li>void print_int(int64_t i) {</li>
<li> char buf[256];</li>
<li> snprintf(buf, 256, "%" PRId64, i);</li>
<li> echo(buf);</li>
<li> TRACE(1, "t-x64 output(int): %" PRId64 "n", i);</li>
<li>}</li>
</ol></div>
<em onclick="copycode($('code_K6f'));">复制代码</em>
</div> 可以看到 HHVM 编译出来的代码直接使用了 snprintf(buf, 256, "%" PRId64, i); echo(buf);<div class="blockcode">
<div id="code_K70"><ol><li>-v Eval.JitWarmupRequests=0</li></ol></div>
<em onclick="copycode($('code_K70'));">复制代码</em>
</div> TRACE(1, "t-x64 output(int): %" PRId64 "n", i);} Copy codeYou can see that the code compiled by HHVM directly uses <div class="blockcode">
<div id="code_biL"><ol>
<li>
<?hh <li>class Point2 {</li>
<li> public float $x, $y;</li>
<li> function __construct(float $x, float $y) {</li>
<li> $this->x = $x;</li>
<li> $this->y = $y;</li>
<li> }</li>
<li>}</li>
<li>//来自:https://raw.github.com/strangeloop/StrangeLoop2013/master/slides/sessions/Adams-TakingPHPSeriously.pdf</li>
</ol></div>
<em onclick="copycode($('code_biL'));">复制代码</em>
</div> 注意到
| Added in January 2014: The current promotion momentum of HHVM in the factory is very good. It is recommended that everyone try it in 2014, especially now that the compatibility test has reached 98.58%, and the modification cost has been further reduced.
How Three Guys Rebuilt the Foundation of Facebook🎜 🎜PHP on the Metal with HHVM🎜 🎜Making HPHPi Faster🎜 🎜HHVM Optimization Tips🎜 🎜The HipHop Virtual Machine (hhvm) PHP Execution at the Speed of JIT🎜 🎜Julien Verlaguet, Facebook: Analyzing PHP statically🎜 🎜Speeding up PHP-based development with HHVM🎜 🎜Adding an opcode to HHBC🎜 🎜🎜🎜🎜 🎜 🎜 🎜 🎜🎜 🎜🎜

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

In PHP, use the clone keyword to create a copy of the object and customize the cloning behavior through the \_\_clone magic method. 1. Use the clone keyword to make a shallow copy, cloning the object's properties but not the object's properties. 2. The \_\_clone method can deeply copy nested objects to avoid shallow copying problems. 3. Pay attention to avoid circular references and performance problems in cloning, and optimize cloning operations to improve efficiency.

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

Key players in HTTP cache headers include Cache-Control, ETag, and Last-Modified. 1.Cache-Control is used to control caching policies. Example: Cache-Control:max-age=3600,public. 2. ETag verifies resource changes through unique identifiers, example: ETag: "686897696a7c876b7e". 3.Last-Modified indicates the resource's last modification time, example: Last-Modified:Wed,21Oct201507:28:00GMT.

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP is a server-side scripting language used for dynamic web development and server-side applications. 1.PHP is an interpreted language that does not require compilation and is suitable for rapid development. 2. PHP code is embedded in HTML, making it easy to develop web pages. 3. PHP processes server-side logic, generates HTML output, and supports user interaction and data processing. 4. PHP can interact with the database, process form submission, and execute server-side tasks.

PHP has shaped the network over the past few decades and will continue to play an important role in web development. 1) PHP originated in 1994 and has become the first choice for developers due to its ease of use and seamless integration with MySQL. 2) Its core functions include generating dynamic content and integrating with the database, allowing the website to be updated in real time and displayed in personalized manner. 3) The wide application and ecosystem of PHP have driven its long-term impact, but it also faces version updates and security challenges. 4) Performance improvements in recent years, such as the release of PHP7, enable it to compete with modern languages. 5) In the future, PHP needs to deal with new challenges such as containerization and microservices, but its flexibility and active community make it adaptable.

The core benefits of PHP include ease of learning, strong web development support, rich libraries and frameworks, high performance and scalability, cross-platform compatibility, and cost-effectiveness. 1) Easy to learn and use, suitable for beginners; 2) Good integration with web servers and supports multiple databases; 3) Have powerful frameworks such as Laravel; 4) High performance can be achieved through optimization; 5) Support multiple operating systems; 6) Open source to reduce development costs.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver Mac version
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment