Home  >  Article  >  php教程  >  【性能为王】从PHP源码剖析array_keys和array_unique,arraykeys

【性能为王】从PHP源码剖析array_keys和array_unique,arraykeys

WBOY
WBOYOriginal
2016-06-13 08:45:421317browse

【性能为王】从PHP源码剖析array_keys和array_unique,arraykeys

之前在[译]更快的方式实现PHP数组去重这篇文章里讨论了使用array_flip后再调用array_keys函数替换直接调用array_unique函数实现数组去重性能较好。由于原文没有给出源码分析和测试的结果,导致给读者造成迷惑,在此说声抱歉。为了解开读者的疑惑,笔者承诺了会补上源码的分析,于是花了一些时间去研究PHP的源码,现在此补上详细的说明。

 

性能分析 

从运行性能上分析,看看下面的测试代码:

<span>$test</span>=<span>array</span><span>();
</span><span>for</span>(<span>$run</span>=0; <span>$run</span><10000; <span>$run</span>++<span>)
</span><span>$test</span>[]=<span>rand</span>(0,100<span>);

</span><span>$time</span>=<span>microtime</span>(<span>true</span><span>);

</span><span>$out</span> = <span>array_unique</span>(<span>$test</span><span>);

</span><span>$time</span>=<span>microtime</span>(<span>true</span>)-<span>$time</span><span>;
</span><span>echo</span> 'Array Unique: '.<span>$time</span>."\n"<span>;

</span><span>$time</span>=<span>microtime</span>(<span>true</span><span>);

</span><span>$out</span>=<span>array_keys</span>(<span>array_flip</span>(<span>$test</span><span>));

</span><span>$time</span>=<span>microtime</span>(<span>true</span>)-<span>$time</span><span>;
</span><span>echo</span> 'Keys Flip: '.<span>$time</span>."\n"<span>;

</span><span>$time</span>=<span>microtime</span>(<span>true</span><span>);

</span><span>$out</span>=<span>array_flip</span>(<span>array_flip</span>(<span>$test</span><span>));

</span><span>$time</span>=<span>microtime</span>(<span>true</span>)-<span>$time</span><span>;
</span><span>echo</span> 'Flip Flip: '.<span>$time</span>."\n";

 

运行结果如下:

从上图可以看到,使用array_unique函数需要0.069s;使用array_flip后再使用array_keys函数需要0.00152s;使用两次array_flip函数需要0.00146s。

测试结果表明,使用array_flip后再调用array_keys函数比array_unique函数快。那么,具体原因是什么呢?让我们看看在PHP底层,这两个函数是怎么实现的。

 

源码分析

<span> 1</span> <span>/*</span><span> {{{ proto array array_keys(array input [, mixed search_value[, bool strict]])
</span><span> 2</span> <span>   Return just the keys from the input array, optionally only for the specified search_value </span><span>*/</span>
<span> 3</span> <span>PHP_FUNCTION(array_keys)
</span><span> 4</span> <span>{
</span><span> 5</span>     <span>//</span><span>变量定义</span>
<span> 6</span>     zval *input,                <span>/*</span><span> Input array </span><span>*/</span>
<span> 7</span>          *search_value = NULL,    <span>/*</span><span> Value to search for </span><span>*/</span>
<span> 8</span>          **entry,                <span>/*</span><span> An entry in the input array </span><span>*/</span>
<span> 9</span>            res,                    <span>/*</span><span> Result of comparison </span><span>*/</span>
<span>10</span>           *new_val;                <span>/*</span><span> New value </span><span>*/</span>
<span>11</span>     <span>int</span>    add_key;                <span>/*</span><span> Flag to indicate whether a key should be added </span><span>*/</span>
<span>12</span>     <span>char</span>  *string_key;            <span>/*</span><span> String key </span><span>*/</span>
<span>13</span>     <span>uint</span><span>   string_key_len;
</span><span>14</span>     <span>ulong</span>  num_key;                <span>/*</span><span> Numeric key </span><span>*/</span>
<span>15</span>     zend_bool strict = <span>0</span>;        <span>/*</span><span> do strict comparison </span><span>*/</span>
<span>16</span> <span>    HashPosition pos;
</span><span>17</span>     <span>int</span> (*is_equal_func)(zval *, zval *, zval * TSRMLS_DC) =<span> is_equal_function;
</span><span>18</span> 
<span>19</span>     <span>//</span><span>程序解析参数</span>
<span>20</span>     <span>if</span> (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, <span>"</span><span>a|zb</span><span>"</span>, &input, &search_value, &strict) ==<span> FAILURE) {
</span><span>21</span>         <span>return</span><span>;
</span><span>22</span> <span>    }
</span><span>23</span> 
<span>24</span>     <span>//</span><span> 如果strict是true,则设置is_equal_func为is_identical_function,即全等比较</span>
<span>25</span>     <span>if</span><span> (strict) {
</span><span>26</span>         is_equal_func =<span> is_identical_function;
</span><span>27</span> <span>    }
</span><span>28</span> 
<span>29</span>     <span>/*</span><span> 根据search_vale初始化返回的数组大小 </span><span>*/</span>
<span>30</span>     <span>if</span> (search_value !=<span> NULL) {
</span><span>31</span> <span>        array_init(return_value);
</span><span>32</span>     } <span>else</span><span> {
</span><span>33</span> <span>        array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(input)));
</span><span>34</span> <span>    }
</span><span>35</span>     add_key = <span>1</span><span>;
</span><span>36</span> 
<span>37</span>     <span>/*</span><span> 遍历输入的数组参数,然后添加键值到返回的数组 </span><span>*/</span>
<span>38</span>     zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(input), &pos);<span>//</span><span>重置指针
</span><span>39</span>     <span>//</span><span>循环遍历数组</span>
<span>40</span>     <span>while</span> (zend_hash_get_current_data_ex(Z_ARRVAL_P(input), (<span>void</span> **)&entry, &pos) ==<span> SUCCESS) {
</span><span>41</span>         <span>//</span><span> 如果search_value不为空</span>
<span>42</span>         <span>if</span> (search_value !=<span> NULL) {
</span><span>43</span>             <span>//</span><span> 判断search_value与当前的值是否相同,并将比较结果保存到add_key变量</span>
<span>44</span>             is_equal_func(&res, search_value, *<span>entry TSRMLS_CC);
</span><span>45</span>             add_key = zval_is_true(&<span>res);
</span><span>46</span> <span>        }
</span><span>47</span> 
<span>48</span>         <span>if</span><span> (add_key) {
</span><span>49</span>             <span>//</span><span> 创建一个zval结构体</span>
<span>50</span> <span>            MAKE_STD_ZVAL(new_val);
</span><span>51</span> 
<span>52</span>             <span>//</span><span> 根据键值是字符串还是整型数字将值插入到return_value中</span>
<span>53</span>             <span>switch</span> (zend_hash_get_current_key_ex(Z_ARRVAL_P(input), &string_key, &string_key_len, &num_key, <span>1</span>, &<span>pos)) {
</span><span>54</span>                 <span>case</span><span> HASH_KEY_IS_STRING:
</span><span>55</span>                     ZVAL_STRINGL(new_val, string_key, string_key_len - <span>1</span>, <span>0</span><span>);
</span><span>56</span>                     <span>//</span><span> 此函数负责将值插入到return_value中,如果键值已存在,则使用新值更新对应的值,否则直接插入</span>
<span>57</span>                     zend_hash_next_index_insert(Z_ARRVAL_P(return_value), &new_val, <span>sizeof</span>(zval *<span>), NULL);
</span><span>58</span>                     <span>break</span><span>;
</span><span>59</span> 
<span>60</span>                 <span>case</span><span> HASH_KEY_IS_LONG:
</span><span>61</span>                     Z_TYPE_P(new_val) =<span> IS_LONG;
</span><span>62</span>                     Z_LVAL_P(new_val) =<span> num_key;
</span><span>63</span>                     zend_hash_next_index_insert(Z_ARRVAL_P(return_value), &new_val, <span>sizeof</span>(zval *<span>), NULL);
</span><span>64</span>                     <span>break</span><span>;
</span><span>65</span> <span>            }
</span><span>66</span> <span>        }
</span><span>67</span> 
<span>68</span>         <span>//</span><span> 移动到下一个</span>
<span>69</span>         zend_hash_move_forward_ex(Z_ARRVAL_P(input), &<span>pos);
</span><span>70</span> <span>    }
</span><span>71</span> <span>}
</span><span>72</span> <span>/*</span><span> }}} </span><span>*/</span>

以上是array_keys函数底层的源码。为方便理解,笔者添加了一些中文注释。如果需要查看原始代码,可以点击查看。这个函数的功能就是新建一个临时数组,然后将键值对重新复制到新的数组,如果复制过程中有重复的键值出现,那么就用新的值替换。这个函数的主要步骤是地57和63行调用的zend_hash_next_index_insert函数。该函数将元素插入到数组中,如果出现重复的值,则使用新的值更新原键值指向的值,否则直接插入,时间复杂度是O(n)。

 

<span> 1</span> <span>/*</span><span> {{{ proto array array_flip(array input)
</span><span> 2</span> <span>   Return array with key <-> value flipped </span><span>*/</span>
<span> 3</span> <span>PHP_FUNCTION(array_flip)
</span><span> 4</span> <span>{
</span><span> 5</span>     <span>//</span><span> 定义变量</span>
<span> 6</span>     zval *array, **entry, *<span>data;
</span><span> 7</span>     <span>char</span> *<span>string_key;
</span><span> 8</span>     <span>uint</span><span> str_key_len;
</span><span> 9</span>     <span>ulong</span><span> num_key;
</span><span>10</span> <span>    HashPosition pos;
</span><span>11</span> 
<span>12</span>     <span>//</span><span> 解析数组参数</span>
<span>13</span>     <span>if</span> (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, <span>"</span><span>a</span><span>"</span>, &array) ==<span> FAILURE) {
</span><span>14</span>         <span>return</span><span>;
</span><span>15</span> <span>    }
</span><span>16</span> 
<span>17</span>     <span>//</span><span> 初始化返回数组</span>
<span>18</span> <span>    array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(array)));
</span><span>19</span> 
<span>20</span>     <span>//</span><span> 重置指针</span>
<span>21</span>     zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(array), &<span>pos);
</span><span>22</span>     <span>//</span><span> 遍历每个元素,并执行键<->值交换操作</span>
<span>23</span>     <span>while</span> (zend_hash_get_current_data_ex(Z_ARRVAL_P(array), (<span>void</span> **)&entry, &pos) ==<span> SUCCESS) {
</span><span>24</span>         <span>//</span><span> 初始化一个结构体</span>
<span>25</span> <span>        MAKE_STD_ZVAL(data);
</span><span>26</span>         <span>//</span><span> 将原数组的值赋值为新数组的键</span>
<span>27</span>         <span>switch</span> (zend_hash_get_current_key_ex(Z_ARRVAL_P(array), &string_key, &str_key_len, &num_key, <span>1</span>, &<span>pos)) {
</span><span>28</span>             <span>case</span><span> HASH_KEY_IS_STRING:
</span><span>29</span>                 ZVAL_STRINGL(data, string_key, str_key_len - <span>1</span>, <span>0</span><span>);
</span><span>30</span>                 <span>break</span><span>;
</span><span>31</span>             <span>case</span><span> HASH_KEY_IS_LONG:
</span><span>32</span>                 Z_TYPE_P(data) =<span> IS_LONG;
</span><span>33</span>                 Z_LVAL_P(data) =<span> num_key;
</span><span>34</span>                 <span>break</span><span>;
</span><span>35</span> <span>        }
</span><span>36</span> 
<span>37</span>         <span>//</span><span> 将原数组的键赋值为新数组的值,如果有重复的,则使用新值覆盖旧值</span>
<span>38</span>         <span>if</span> (Z_TYPE_PP(entry) ==<span> IS_LONG) {
</span><span>39</span>             zend_hash_index_update(Z_ARRVAL_P(return_value), Z_LVAL_PP(entry), &data, <span>sizeof</span><span>(data), NULL);
</span><span>40</span>         } <span>else</span> <span>if</span> (Z_TYPE_PP(entry) ==<span> IS_STRING) {
</span><span>41</span>             zend_symtable_update(Z_ARRVAL_P(return_value), Z_STRVAL_PP(entry), Z_STRLEN_PP(entry) + <span>1</span>, &data, <span>sizeof</span><span>(data), NULL);
</span><span>42</span>         } <span>else</span><span> {
</span><span>43</span>             zval_ptr_dtor(&data); <span>/*</span><span> will free also zval structure </span><span>*/</span>
<span>44</span>             php_error_docref(NULL TSRMLS_CC, E_WARNING, <span>"</span><span>Can only flip STRING and INTEGER values!</span><span>"</span><span>);
</span><span>45</span> <span>        }
</span><span>46</span> 
<span>47</span>         <span>//</span><span> 下一个</span>
<span>48</span>         zend_hash_move_forward_ex(Z_ARRVAL_P(array), &<span>pos);
</span><span>49</span> <span>    }
</span><span>50</span> <span>}
</span><span>51</span> <span>/*</span><span> }}} </span><span>*/</span>

 

上面就是是array_flip函数的源码。点击链接查看原始代码。这个函数主要的做的事情就是创建一个新的数组,遍历原数组。在26行开始将原数组的值赋值为新数组的键,然后在37行开始将原数组的键赋值为新数组的值,如果有重复的,则使用新值覆盖旧值。整个函数的时间复杂度也是O(n)。因此,使用了array_flip之后再使用array_keys的时间复杂度是O(n)。

 

接下来,我们看看array_unique函数的源码。点击链接查看原始代码。

<span> 1</span> <span>/*</span><span> {{{ proto array array_unique(array input [, int sort_flags])
</span><span> 2</span> <span>   Removes duplicate values from array </span><span>*/</span>
<span> 3</span> <span>PHP_FUNCTION(array_unique)
</span><span> 4</span> <span>{
</span><span> 5</span>     <span>//</span><span> 定义变量</span>
<span> 6</span>     zval *array, *<span>tmp;
</span><span> 7</span>     Bucket *<span>p;
</span><span> 8</span>     <span>struct</span><span> bucketindex {
</span><span> 9</span>         Bucket *<span>b;
</span><span>10</span>         unsigned <span>int</span><span> i;
</span><span>11</span> <span>    };
</span><span>12</span>     <span>struct</span> bucketindex *arTmp, *cmpdata, *<span>lastkept;
</span><span>13</span>     unsigned <span>int</span><span> i;
</span><span>14</span>     <span>long</span> sort_type =<span> PHP_SORT_STRING;
</span><span>15</span> 
<span>16</span>     <span>//</span><span> 解析参数</span>
<span>17</span>     <span>if</span> (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, <span>"</span><span>a|l</span><span>"</span>, &array, &sort_type) ==<span> FAILURE) {
</span><span>18</span>         <span>return</span><span>;
</span><span>19</span> <span>    }
</span><span>20</span> 
<span>21</span>     <span>//</span><span> 设置比较函数</span>
<span>22</span> <span>    php_set_compare_func(sort_type TSRMLS_CC);
</span><span>23</span> 
<span>24</span>     <span>//</span><span> 初始化返回数组</span>
<span>25</span> <span>    array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(array)));
</span><span>26</span>     <span>//</span><span> 将值拷贝到新数组</span>
<span>27</span>     zend_hash_copy(Z_ARRVAL_P(return_value), Z_ARRVAL_P(array), (copy_ctor_func_t) zval_add_ref, (<span>void</span> *)&tmp, <span>sizeof</span>(zval*<span>));
</span><span>28</span> 
<span>29</span>     <span>if</span> (Z_ARRVAL_P(array)->nNumOfElements <= <span>1</span>) {    <span>/*</span><span> 什么都不做 </span><span>*/</span>
<span>30</span>         <span>return</span><span>;
</span><span>31</span> <span>    }
</span><span>32</span> 
<span>33</span>     <span>/*</span><span> 根据target_hash buckets的指针创建数组并排序 </span><span>*/</span>
<span>34</span>     arTmp = (<span>struct</span> bucketindex *) pemalloc((Z_ARRVAL_P(array)->nNumOfElements + <span>1</span>) * <span>sizeof</span>(<span>struct</span> bucketindex), Z_ARRVAL_P(array)-><span>persistent);
</span><span>35</span>     <span>if</span> (!<span>arTmp) {
</span><span>36</span> <span>        zval_dtor(return_value);
</span><span>37</span> <span>        RETURN_FALSE;
</span><span>38</span> <span>    }
</span><span>39</span>     <span>for</span> (i = <span>0</span>, p = Z_ARRVAL_P(array)->pListHead; p; i++, p = p-><span>pListNext) {
</span><span>40</span>         arTmp[i].b =<span> p;
</span><span>41</span>         arTmp[i].i =<span> i;
</span><span>42</span> <span>    }
</span><span>43</span>     arTmp[i].b =<span> NULL;
</span><span>44</span>     <span>//</span><span> 排序</span>
<span>45</span>     zend_qsort((<span>void</span> *) arTmp, i, <span>sizeof</span>(<span>struct</span><span> bucketindex), php_array_data_compare TSRMLS_CC);
</span><span>46</span> 
<span>47</span>     <span>/*</span><span> 遍历排序好的数组,然后删除重复的元素 </span><span>*/</span>
<span>48</span>     lastkept =<span> arTmp;
</span><span>49</span>     <span>for</span> (cmpdata = arTmp + <span>1</span>; cmpdata->b; cmpdata++<span>) {
</span><span>50</span>         <span>if</span><span> (php_array_data_compare(lastkept, cmpdata TSRMLS_CC)) {
</span><span>51</span>             lastkept =<span> cmpdata;
</span><span>52</span>         } <span>else</span><span> {
</span><span>53</span>             <span>if</span> (lastkept->i > cmpdata-><span>i) {
</span><span>54</span>                 p = lastkept-><span>b;
</span><span>55</span>                 lastkept =<span> cmpdata;
</span><span>56</span>             } <span>else</span><span> {
</span><span>57</span>                 p = cmpdata-><span>b;
</span><span>58</span> <span>            }
</span><span>59</span>             <span>if</span> (p->nKeyLength == <span>0</span><span>) {
</span><span>60</span>                 zend_hash_index_del(Z_ARRVAL_P(return_value), p-><span>h);
</span><span>61</span>             } <span>else</span><span> {
</span><span>62</span>                 <span>if</span> (Z_ARRVAL_P(return_value) == &<span>EG(symbol_table)) {
</span><span>63</span>                     zend_delete_global_variable(p->arKey, p->nKeyLength - <span>1</span><span> TSRMLS_CC);
</span><span>64</span>                 } <span>else</span><span> {
</span><span>65</span>                     zend_hash_quick_del(Z_ARRVAL_P(return_value), p->arKey, p->nKeyLength, p-><span>h);
</span><span>66</span> <span>                }
</span><span>67</span> <span>            }
</span><span>68</span> <span>        }
</span><span>69</span> <span>    }
</span><span>70</span>     pefree(arTmp, Z_ARRVAL_P(array)-><span>persistent);
</span><span>71</span> <span>}
</span><span>72</span> <span>/*</span><span> }}} </span><span>*/</span>

可以看到,这个函数初始化一个新的数组,然后将值拷贝到新数组,然后在45行调用排序函数对数组进行排序,排序的算法是zend引擎的块树排序算法。接着遍历排序好的数组,删除重复的元素。整个函数开销最大的地方就在调用排序函数上,而快排的时间复杂度是O(n*logn),因此,该函数的时间复杂度是O(n*logn)。

结论

因为array_unique底层调用了快排算法,加大了函数运行的时间开销,导致整个函数的运行较慢。这就是为什么array_keys比array_unique函数更快的原因。

 

原创文章,文笔有限,才疏学浅,文中若有不正之处,万望告知。

如果本文对你有帮助,请点下推荐,写文章不容易。

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn