Home  >  Article  >  Backend Development  >  Some difference set methods and performance comparison in PHP

Some difference set methods and performance comparison in PHP

*文
*文Original
2017-12-23 15:55:272342browse

In programming, there is always a need to process some data, such as taking the difference between two given arrays. Although there are many implementation methods, which method has better performance for finding the difference set? Today we will share an example of finding a difference set and how to optimize the performance of our code.

The question is as follows: You are given two arrays with 5000 elements each, and calculate their difference - to put it bluntly, it is to use PHP and the algorithm you think is the best to implement the array_diff algorithm. When I received this question for the first time, I found that it was very simple, so I wrote one "casually" based on my past experience:

 function array_diff($array_1, $array_2) { 
    $diff = array(); 
    foreach ($array_1 as $k => $v1) { 
        $flag = false; 
        foreach ($array_2 as $v2) { 
            if ($flag = ($v1 == $v2)) { 
                break; 
            } 
        } 
        if (!$flag) { 
            $diff[$k] = $v1; 
        } 
    } 
    return $diff; 
}

Although the implementation is possible, I found that the efficiency of this function is appalling. So I reconsidered and optimized the algorithm. The second function looked like this:

function array_diff($array_1, $array_2) { 
    foreach ($array_1 as $key => $item) { 
        if (in_array($item, $array_2, true)) { 
            unset($array_1[$key]); 
        } 
    } 
    return $array_1; 
}

Well, this time it is almost as fast as the original array_diff function. But is there a more optimized way? From an article on ChinaUnix (sorry, I cheated), I found that PHP can actually be written like this:

function array_diff($array_1, $array_2) { 
    $array_2 = array_flip($array_2); 
    foreach ($array_1 as $key => $item) { 
        if (isset($array_2[$item])) { 
            unset($array_1[$key]); 
        } 
     } 
    return $array_1; 
}

The efficiency of this function is very amazing, even faster than the original array_diff function. Investigating the reason, I found an explanation:


Because the key is organized by HASH, the search is very fast;

And the Value is only stored in the Key organization. There is no index itself, and every search is traversed. Summary

Although this is a little trick of the PHP language, when it comes to traversing and comparing array values, if you need to compare the value and reverse it with the key, it is indeed more efficient than the usual value-to-value comparison. many.


For example, the function two above needs to call the in_array function and needs to loop to determine whether it is within the function; while the function three only determines whether the key exists in the array. . Coupled with the different organizational indexing methods of array keys and values, it is very understandable that the efficiency is higher than imagined.

<?php 
function microtime_float() { 
    list($usec, $sec) = explode(" ", microtime()); 
    return ((float)$usec + (float)$sec); 
} 
function array_diff2($array_1, $array_2) { 
    $diff = array(); 
    foreach ($array_1 as $k => $v1) { 
        $flag = false; 
        foreach ($array_2 as $v2) { 
            if ($flag = ($v1 == $v2)) { 
                break; 
            } 
        } 
        if (!$flag) { 
            $diff[$k] = $v1; 
        } 
    } 
    return $diff; 
} 
function array_diff3($array_1, $array_2) { 
    foreach ($array_1 as $key => $item) { 
        if (in_array($item, $array_2, true)) { 
            unset($array_1[$key]); 
        } 
    } 
    return $array_1; 
} 
function array_diff4($array_1, $array_2) { 
    $array_2 = array_flip($array_2); 
    foreach ($array_1 as $key => $item) { 
        if (isset($array_2[$item])) { 
            unset($array_1[$key]); 
        } 
     } 
    return $array_1; 
} 
////////////////////////////// 
for($i = 0, $ary_1 = array(); $i < 5000; $i++) { 
    $ary_1[] = rand(100, 999); 
} 
for($i = 0, $ary_2 = array(); $i < 5000; $i++) { 
    $ary_2[] = rand(100, 999); 
} 
header("Content-type: text/plain;charset=utf-8"); 
$time_start = microtime_float(); 
array_diff($ary_1, $ary_2); 
echo "函数 array_diff 运行" . (microtime_float() - $time_start) . " 秒\n"; 
$time_start = microtime_float(); 
array_diff2($ary_1, $ary_2); 
echo "函数 array_diff2 运行" . (microtime_float() - $time_start) . " 秒\n"; 
$time_start = microtime_float(); 
array_diff3($ary_1, $ary_2); 
echo "函数 array_diff3 运行" . (microtime_float() - $time_start) . " 秒\n"; 
$time_start = microtime_float(); 
array_diff4($ary_1, $ary_2); 
echo "函数 array_diff4 运行" . (microtime_float() - $time_start) . " 秒\n"; 
?>

Some difference set methods and performance comparison in PHP


Related recommendations:

php Algorithm Segmentation Array, without array_chunk()_PHP tutorial

Union, intersection and difference functions of arrays

Summary of PHP array sorting method

The above is the detailed content of Some difference set methods and performance comparison in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn