Home  >  Article  >  Backend Development  >  [Performance is King] Analyzing array_keys and array_unique from PHP source code, arraykeys_PHP tutorial

[Performance is King] Analyzing array_keys and array_unique from PHP source code, arraykeys_PHP tutorial

WBOY
WBOYOriginal
2016-07-12 08:58:291090browse

[Performance is King] Analyzing array_keys and array_unique from PHP source code, arraykeys

was previously discussed in [Translation] A faster way to implement PHP array deduplication in this article After array_flip, call the array_keys function instead of directly calling the array_unique function to achieve better array deduplication performance. Since the original article did not give the results of source code analysis and testing, which caused confusion to readers, I would like to apologize. In order to solve readers' doubts, the author promised to add source code analysis, so I spent some time studying the PHP source code, and now I will add detailed instructions.

Performance Analysis

From the perspective of running performance, take a look at the following test code:

<span>$test</span>=<span>array</span><span>();
</span><span>for</span>(<span>$run</span>=0; <span>$run</span><10000; <span>$run</span>++<span>)
</span><span>$test</span>[]=<span>rand</span>(0,100<span>);

</span><span>$time</span>=<span>microtime</span>(<span>true</span><span>);

</span><span>$out</span> = <span>array_unique</span>(<span>$test</span><span>);

</span><span>$time</span>=<span>microtime</span>(<span>true</span>)-<span>$time</span><span>;
</span><span>echo</span> 'Array Unique: '.<span>$time</span>."\n"<span>;

</span><span>$time</span>=<span>microtime</span>(<span>true</span><span>);

</span><span>$out</span>=<span>array_keys</span>(<span>array_flip</span>(<span>$test</span><span>));

</span><span>$time</span>=<span>microtime</span>(<span>true</span>)-<span>$time</span><span>;
</span><span>echo</span> 'Keys Flip: '.<span>$time</span>."\n"<span>;

</span><span>$time</span>=<span>microtime</span>(<span>true</span><span>);

</span><span>$out</span>=<span>array_flip</span>(<span>array_flip</span>(<span>$test</span><span>));

</span><span>$time</span>=<span>microtime</span>(<span>true</span>)-<span>$time</span><span>;
</span><span>echo</span> 'Flip Flip: '.<span>$time</span>."\n";

The running results are as follows:

As you can see from the picture above, using the array_unique function takes 0.069s; using array_flip and then using the array_keys function takes 0.00152s; using the array_flip function twice takes 0.00146s.

The test results show that using array_flip and then calling the array_keys function is faster than the array_unique function. So, what is the specific reason? Let's take a look at how these two functions are implemented at the bottom of PHP.

Source code analysis

<span> 1</span> <span>/*</span><span> {{{ proto array array_keys(array input [, mixed search_value[, bool strict]])
</span><span> 2</span> <span>   Return just the keys from the input array, optionally only for the specified search_value </span><span>*/</span>
<span> 3</span> <span>PHP_FUNCTION(array_keys)
</span><span> 4</span> <span>{
</span><span> 5</span>     <span>//</span><span>变量定义</span>
<span> 6</span>     zval *input,                <span>/*</span><span> Input array </span><span>*/</span>
<span> 7</span>          *search_value = NULL,    <span>/*</span><span> Value to search for </span><span>*/</span>
<span> 8</span>          **entry,                <span>/*</span><span> An entry in the input array </span><span>*/</span>
<span> 9</span>            res,                    <span>/*</span><span> Result of comparison </span><span>*/</span>
<span>10</span>           *new_val;                <span>/*</span><span> New value </span><span>*/</span>
<span>11</span>     <span>int</span>    add_key;                <span>/*</span><span> Flag to indicate whether a key should be added </span><span>*/</span>
<span>12</span>     <span>char</span>  *string_key;            <span>/*</span><span> String key </span><span>*/</span>
<span>13</span>     <span>uint</span><span>   string_key_len;
</span><span>14</span>     <span>ulong</span>  num_key;                <span>/*</span><span> Numeric key </span><span>*/</span>
<span>15</span>     zend_bool strict = <span>0</span>;        <span>/*</span><span> do strict comparison </span><span>*/</span>
<span>16</span> <span>    HashPosition pos;
</span><span>17</span>     <span>int</span> (*is_equal_func)(zval *, zval *, zval * TSRMLS_DC) =<span> is_equal_function;
</span><span>18</span> 
<span>19</span>     <span>//</span><span>程序解析参数</span>
<span>20</span>     <span>if</span> (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, <span>"</span><span>a|zb</span><span>"</span>, &input, &search_value, &strict) ==<span> FAILURE) {
</span><span>21</span>         <span>return</span><span>;
</span><span>22</span> <span>    }
</span><span>23</span> 
<span>24</span>     <span>//</span><span> 如果strict是true,则设置is_equal_func为is_identical_function,即全等比较</span>
<span>25</span>     <span>if</span><span> (strict) {
</span><span>26</span>         is_equal_func =<span> is_identical_function;
</span><span>27</span> <span>    }
</span><span>28</span> 
<span>29</span>     <span>/*</span><span> 根据search_vale初始化返回的数组大小 </span><span>*/</span>
<span>30</span>     <span>if</span> (search_value !=<span> NULL) {
</span><span>31</span> <span>        array_init(return_value);
</span><span>32</span>     } <span>else</span><span> {
</span><span>33</span> <span>        array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(input)));
</span><span>34</span> <span>    }
</span><span>35</span>     add_key = <span>1</span><span>;
</span><span>36</span> 
<span>37</span>     <span>/*</span><span> 遍历输入的数组参数,然后添加键值到返回的数组 </span><span>*/</span>
<span>38</span>     zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(input), &pos);<span>//</span><span>重置指针
</span><span>39</span>     <span>//</span><span>循环遍历数组</span>
<span>40</span>     <span>while</span> (zend_hash_get_current_data_ex(Z_ARRVAL_P(input), (<span>void</span> **)&entry, &pos) ==<span> SUCCESS) {
</span><span>41</span>         <span>//</span><span> 如果search_value不为空</span>
<span>42</span>         <span>if</span> (search_value !=<span> NULL) {
</span><span>43</span>             <span>//</span><span> 判断search_value与当前的值是否相同,并将比较结果保存到add_key变量</span>
<span>44</span>             is_equal_func(&res, search_value, *<span>entry TSRMLS_CC);
</span><span>45</span>             add_key = zval_is_true(&<span>res);
</span><span>46</span> <span>        }
</span><span>47</span> 
<span>48</span>         <span>if</span><span> (add_key) {
</span><span>49</span>             <span>//</span><span> 创建一个zval结构体</span>
<span>50</span> <span>            MAKE_STD_ZVAL(new_val);
</span><span>51</span> 
<span>52</span>             <span>//</span><span> 根据键值是字符串还是整型数字将值插入到return_value中</span>
<span>53</span>             <span>switch</span> (zend_hash_get_current_key_ex(Z_ARRVAL_P(input), &string_key, &string_key_len, &num_key, <span>1</span>, &<span>pos)) {
</span><span>54</span>                 <span>case</span><span> HASH_KEY_IS_STRING:
</span><span>55</span>                     ZVAL_STRINGL(new_val, string_key, string_key_len - <span>1</span>, <span>0</span><span>);
</span><span>56</span>                     <span>//</span><span> 此函数负责将值插入到return_value中,如果键值已存在,则使用新值更新对应的值,否则直接插入</span>
<span>57</span>                     zend_hash_next_index_insert(Z_ARRVAL_P(return_value), &new_val, <span>sizeof</span>(zval *<span>), NULL);
</span><span>58</span>                     <span>break</span><span>;
</span><span>59</span> 
<span>60</span>                 <span>case</span><span> HASH_KEY_IS_LONG:
</span><span>61</span>                     Z_TYPE_P(new_val) =<span> IS_LONG;
</span><span>62</span>                     Z_LVAL_P(new_val) =<span> num_key;
</span><span>63</span>                     zend_hash_next_index_insert(Z_ARRVAL_P(return_value), &new_val, <span>sizeof</span>(zval *<span>), NULL);
</span><span>64</span>                     <span>break</span><span>;
</span><span>65</span> <span>            }
</span><span>66</span> <span>        }
</span><span>67</span> 
<span>68</span>         <span>//</span><span> 移动到下一个</span>
<span>69</span>         zend_hash_move_forward_ex(Z_ARRVAL_P(input), &<span>pos);
</span><span>70</span> <span>    }
</span><span>71</span> <span>}
</span><span>72</span> <span>/*</span><span> }}} </span><span>*/</span>

The above is the underlying source code of array_keys function. To facilitate understanding, the author has added some Chinese comments. If you need to view the original code, you can click to view it. The function of this function is to create a temporary array, and then copy the key-value pairs to the new array. If duplicate key values ​​appear during the copying process, replace them with new values. The main step of this function is the zend_hash_next_index_insert function called on lines 57 and 63. This function inserts elements into the array. If a duplicate value appears, the new value is used to update the value pointed to by the original key value. Otherwise, it is inserted directly. The time complexity is O(n).

<span> 1</span> <span>/*</span><span> {{{ proto array array_flip(array input)
</span><span> 2</span> <span>   Return array with key <-> value flipped </span><span>*/</span>
<span> 3</span> <span>PHP_FUNCTION(array_flip)
</span><span> 4</span> <span>{
</span><span> 5</span>     <span>//</span><span> 定义变量</span>
<span> 6</span>     zval *array, **entry, *<span>data;
</span><span> 7</span>     <span>char</span> *<span>string_key;
</span><span> 8</span>     <span>uint</span><span> str_key_len;
</span><span> 9</span>     <span>ulong</span><span> num_key;
</span><span>10</span> <span>    HashPosition pos;
</span><span>11</span> 
<span>12</span>     <span>//</span><span> 解析数组参数</span>
<span>13</span>     <span>if</span> (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, <span>"</span><span>a</span><span>"</span>, &array) ==<span> FAILURE) {
</span><span>14</span>         <span>return</span><span>;
</span><span>15</span> <span>    }
</span><span>16</span> 
<span>17</span>     <span>//</span><span> 初始化返回数组</span>
<span>18</span> <span>    array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(array)));
</span><span>19</span> 
<span>20</span>     <span>//</span><span> 重置指针</span>
<span>21</span>     zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(array), &<span>pos);
</span><span>22</span>     <span>//</span><span> 遍历每个元素,并执行键<->值交换操作</span>
<span>23</span>     <span>while</span> (zend_hash_get_current_data_ex(Z_ARRVAL_P(array), (<span>void</span> **)&entry, &pos) ==<span> SUCCESS) {
</span><span>24</span>         <span>//</span><span> 初始化一个结构体</span>
<span>25</span> <span>        MAKE_STD_ZVAL(data);
</span><span>26</span>         <span>//</span><span> 将原数组的值赋值为新数组的键</span>
<span>27</span>         <span>switch</span> (zend_hash_get_current_key_ex(Z_ARRVAL_P(array), &string_key, &str_key_len, &num_key, <span>1</span>, &<span>pos)) {
</span><span>28</span>             <span>case</span><span> HASH_KEY_IS_STRING:
</span><span>29</span>                 ZVAL_STRINGL(data, string_key, str_key_len - <span>1</span>, <span>0</span><span>);
</span><span>30</span>                 <span>break</span><span>;
</span><span>31</span>             <span>case</span><span> HASH_KEY_IS_LONG:
</span><span>32</span>                 Z_TYPE_P(data) =<span> IS_LONG;
</span><span>33</span>                 Z_LVAL_P(data) =<span> num_key;
</span><span>34</span>                 <span>break</span><span>;
</span><span>35</span> <span>        }
</span><span>36</span> 
<span>37</span>         <span>//</span><span> 将原数组的键赋值为新数组的值,如果有重复的,则使用新值覆盖旧值</span>
<span>38</span>         <span>if</span> (Z_TYPE_PP(entry) ==<span> IS_LONG) {
</span><span>39</span>             zend_hash_index_update(Z_ARRVAL_P(return_value), Z_LVAL_PP(entry), &data, <span>sizeof</span><span>(data), NULL);
</span><span>40</span>         } <span>else</span> <span>if</span> (Z_TYPE_PP(entry) ==<span> IS_STRING) {
</span><span>41</span>             zend_symtable_update(Z_ARRVAL_P(return_value), Z_STRVAL_PP(entry), Z_STRLEN_PP(entry) + <span>1</span>, &data, <span>sizeof</span><span>(data), NULL);
</span><span>42</span>         } <span>else</span><span> {
</span><span>43</span>             zval_ptr_dtor(&data); <span>/*</span><span> will free also zval structure </span><span>*/</span>
<span>44</span>             php_error_docref(NULL TSRMLS_CC, E_WARNING, <span>"</span><span>Can only flip STRING and INTEGER values!</span><span>"</span><span>);
</span><span>45</span> <span>        }
</span><span>46</span> 
<span>47</span>         <span>//</span><span> 下一个</span>
<span>48</span>         zend_hash_move_forward_ex(Z_ARRVAL_P(array), &<span>pos);
</span><span>49</span> <span>    }
</span><span>50</span> <span>}
</span><span>51</span> <span>/*</span><span> }}} </span><span>*/</span>

The above is the source code of array_flip function. Click the link to view the original code. The main thing this function does is to create a new array and traverse the original array. At line 26, the values ​​of the original array are assigned to the keys of the new array, and then at line 37, the keys of the original array are assigned to the values ​​of the new array. If there are duplicates, the new values ​​are used to overwrite the old values. The time complexity of the entire function is also O(n). Therefore, the time complexity of using array_keys after using array_flip is O(n).

Next, let’s take a look at the source code of the array_unique function. Click the link to view the original code.

<span> 1</span> <span>/*</span><span> {{{ proto array array_unique(array input [, int sort_flags])
</span><span> 2</span> <span>   Removes duplicate values from array </span><span>*/</span>
<span> 3</span> <span>PHP_FUNCTION(array_unique)
</span><span> 4</span> <span>{
</span><span> 5</span>     <span>//</span><span> 定义变量</span>
<span> 6</span>     zval *array, *<span>tmp;
</span><span> 7</span>     Bucket *<span>p;
</span><span> 8</span>     <span>struct</span><span> bucketindex {
</span><span> 9</span>         Bucket *<span>b;
</span><span>10</span>         unsigned <span>int</span><span> i;
</span><span>11</span> <span>    };
</span><span>12</span>     <span>struct</span> bucketindex *arTmp, *cmpdata, *<span>lastkept;
</span><span>13</span>     unsigned <span>int</span><span> i;
</span><span>14</span>     <span>long</span> sort_type =<span> PHP_SORT_STRING;
</span><span>15</span> 
<span>16</span>     <span>//</span><span> 解析参数</span>
<span>17</span>     <span>if</span> (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, <span>"</span><span>a|l</span><span>"</span>, &array, &sort_type) ==<span> FAILURE) {
</span><span>18</span>         <span>return</span><span>;
</span><span>19</span> <span>    }
</span><span>20</span> 
<span>21</span>     <span>//</span><span> 设置比较函数</span>
<span>22</span> <span>    php_set_compare_func(sort_type TSRMLS_CC);
</span><span>23</span> 
<span>24</span>     <span>//</span><span> 初始化返回数组</span>
<span>25</span> <span>    array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(array)));
</span><span>26</span>     <span>//</span><span> 将值拷贝到新数组</span>
<span>27</span>     zend_hash_copy(Z_ARRVAL_P(return_value), Z_ARRVAL_P(array), (copy_ctor_func_t) zval_add_ref, (<span>void</span> *)&tmp, <span>sizeof</span>(zval*<span>));
</span><span>28</span> 
<span>29</span>     <span>if</span> (Z_ARRVAL_P(array)->nNumOfElements <= <span>1</span>) {    <span>/*</span><span> 什么都不做 </span><span>*/</span>
<span>30</span>         <span>return</span><span>;
</span><span>31</span> <span>    }
</span><span>32</span> 
<span>33</span>     <span>/*</span><span> 根据target_hash buckets的指针创建数组并排序 </span><span>*/</span>
<span>34</span>     arTmp = (<span>struct</span> bucketindex *) pemalloc((Z_ARRVAL_P(array)->nNumOfElements + <span>1</span>) * <span>sizeof</span>(<span>struct</span> bucketindex), Z_ARRVAL_P(array)-><span>persistent);
</span><span>35</span>     <span>if</span> (!<span>arTmp) {
</span><span>36</span> <span>        zval_dtor(return_value);
</span><span>37</span> <span>        RETURN_FALSE;
</span><span>38</span> <span>    }
</span><span>39</span>     <span>for</span> (i = <span>0</span>, p = Z_ARRVAL_P(array)->pListHead; p; i++, p = p-><span>pListNext) {
</span><span>40</span>         arTmp[i].b =<span> p;
</span><span>41</span>         arTmp[i].i =<span> i;
</span><span>42</span> <span>    }
</span><span>43</span>     arTmp[i].b =<span> NULL;
</span><span>44</span>     <span>//</span><span> 排序</span>
<span>45</span>     zend_qsort((<span>void</span> *) arTmp, i, <span>sizeof</span>(<span>struct</span><span> bucketindex), php_array_data_compare TSRMLS_CC);
</span><span>46</span> 
<span>47</span>     <span>/*</span><span> 遍历排序好的数组,然后删除重复的元素 </span><span>*/</span>
<span>48</span>     lastkept =<span> arTmp;
</span><span>49</span>     <span>for</span> (cmpdata = arTmp + <span>1</span>; cmpdata->b; cmpdata++<span>) {
</span><span>50</span>         <span>if</span><span> (php_array_data_compare(lastkept, cmpdata TSRMLS_CC)) {
</span><span>51</span>             lastkept =<span> cmpdata;
</span><span>52</span>         } <span>else</span><span> {
</span><span>53</span>             <span>if</span> (lastkept->i > cmpdata-><span>i) {
</span><span>54</span>                 p = lastkept-><span>b;
</span><span>55</span>                 lastkept =<span> cmpdata;
</span><span>56</span>             } <span>else</span><span> {
</span><span>57</span>                 p = cmpdata-><span>b;
</span><span>58</span> <span>            }
</span><span>59</span>             <span>if</span> (p->nKeyLength == <span>0</span><span>) {
</span><span>60</span>                 zend_hash_index_del(Z_ARRVAL_P(return_value), p-><span>h);
</span><span>61</span>             } <span>else</span><span> {
</span><span>62</span>                 <span>if</span> (Z_ARRVAL_P(return_value) == &<span>EG(symbol_table)) {
</span><span>63</span>                     zend_delete_global_variable(p->arKey, p->nKeyLength - <span>1</span><span> TSRMLS_CC);
</span><span>64</span>                 } <span>else</span><span> {
</span><span>65</span>                     zend_hash_quick_del(Z_ARRVAL_P(return_value), p->arKey, p->nKeyLength, p-><span>h);
</span><span>66</span> <span>                }
</span><span>67</span> <span>            }
</span><span>68</span> <span>        }
</span><span>69</span> <span>    }
</span><span>70</span>     pefree(arTmp, Z_ARRVAL_P(array)-><span>persistent);
</span><span>71</span> <span>}
</span><span>72</span> <span>/*</span><span> }}} </span><span>*/</span>

As you can see, this function initializes a new array, then copies the values ​​to the new array, and then calls the sorting function on line 45 to sort the array. The sorting algorithm is the block tree sorting algorithm of the zend engine. Then iterate through the sorted array and delete duplicate elements. The most expensive part of the entire function is calling the sorting function, and the time complexity of quick sort is O(n*logn). Therefore, the time complexity of this function is O(n*logn).

Conclusion

Because the bottom layer of array_unique calls the quick sort algorithm, which increases the time cost of function running, causing the entire function to run slower. That's why array_keys is faster than array_unique function.

Original article with limited writing style and limited knowledge. If there is anything wrong in the article, please let me know.

If this article is helpful to you, please click Recommend. Writing articles is not easy.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1102855.htmlTechArticle[Performance is King] Analyzing array_keys and array_unique from PHP source code, arraykeys was previously implemented in a faster way [Translated] This article on PHP array deduplication discusses using array_flip and then calling arra...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn