php容易犯错的10个地方
原文地址:
http://www.toptal.com/php/10-most-common-missing-php- programmers-make
译文地址:http://codecloud.net/php-2056.html
foreach
ループの後に未解決の配列参照を残すPHP での foreach ループの使い方がわからないですか? foreach
ループで参照を使用すると、反復処理中の配列内の各要素を操作する場合に便利です。例:
<code>$arr = array(1, 2, 3, 4);foreach ($arr as &$value) { $value = $value * 2;}// $arr is now array(2, 4, 6, 8)</code>
問題は、注意しないと、望ましくない副作用や結果が生じる可能性があることです。具体的には、上記の例では、コードが実行された後、$value
はスコープ内に残り、配列内の最後の要素への参照を保持します。したがって、$value
を含む後続の操作では、意図せずに配列の最後の要素が変更されてしまう可能性があります。
覚えておくべき重要な点は、foreach
はスコープを作成しないということです。したがって、上記の例の$value
は、スクリプトの最上位スコープ内の参照です。各反復で、foreach
は、$array
の次の要素を指す参照を設定します。したがって、ループが完了した後も、$value
は $array
の最後の要素を指しており、スコープ内に残ります。
これが引き起こす可能性のある、回避的で混乱を招くバグの例を次に示します。
<code>$array = [1, 2, 3];echo implode(',', $array), "\n";foreach ($array as &$value) {} // by referenceecho implode(',', $array), "\n";foreach ($array as $value) {} // by value (i.e., copy)echo implode(',', $array), "\n";</code>
上記のコードは次のように出力します:
<code>1,2,31,2,31,2,2</code>
いいえ、それはタイプミスではありません。最後の行の最後の値は、確かに 3 ではなく 2 です。
なぜですか?
最初の foreach
ループを通過した後、$array
は変更されませんが、上で説明したように、$value
は、$array
の最後の要素へのダングリング参照として残されます(そのforeach
ループは$value
参照によってにアクセスされるため)。
その結果、 2 番目の foreach
ループを通過すると、「奇妙なこと」が起こっているようです。具体的には、$value
は値によって (つまり、copy によって) アクセスされるため、foreach
は、 の各連続する $array
要素を $value
にコピーします。ループ。その結果、2 番目の foreach
ループの各ステップで何が起こるかは次のとおりです:
$array[0]
(つまり、「1」) を (これは $value
への参照です)、$array[2]
は 1 になります。したがって、$array[2]
には [1, 2, 1] が含まれます。$array
$array[1]
($value
への参照) に変換すると、$array[2]
は 2 になります。つまり、$array[2]
には [1, 2, 2] が含まれます。$array
$array[2]
($value
への参照) にコピーするため、$array[2]
は 2 と等しくなります。つまり、$array[2]
現在、[1, 2, 2] が含まれています。$array
ループで参照を使用する利点を得るには、変数に対して foreach
を呼び出します。 unset()
ループの直後に、参照を削除します。例:foreach
<code>$arr = array(1, 2, 3, 4);foreach ($arr as &$value) { $value = $value * 2;}unset($value); // $value no longer references $arr[3]</code>よくある間違い #2:
isset()
はアイテムが存在しない場合に false を返すだけでなく、isset()
値false
に対してnull
を返します。
>
<code>$data = fetchRecordFromStorage($storage, $identifier);if (!isset($data['keyShouldBeSet']) { // do something here if 'keyShouldBeSet' is not set}</code>このコードの作成者は、
が keyShouldBeSet
に設定されているかどうかを確認したかったと考えられます。ただし、前述したように、$data
isset($data['keyShouldBeSet'])
がに設定されていたが、に設定されていた場合、$data['keyShouldBeSet']
もで false を返します。したがって、上記のロジックには欠陥があります。null
ここに別の例を示します:
<code>if ($_POST['active']) { $postData = extractSomething($_POST);}// ...if (!isset($postData)) { echo 'post not active';}</code>
The above code assumes that if $_POST['active']
returns true
, then postData
will necessarily be set, and therefore isset($postData)
will return true
. So conversely, the above code assumes that the only way that isset($postData)
will return false
is if $_POST['active']
returned false
as well.
Not.
As explained, isset($postData)
will also return false
if $postData
was set to null
. It therefore is possible for isset($postData)
to return false
even if $_POST['active']
returned true
. So again, the above logic is flawed.
And by the way, as a side point, if the intent in the above code really was to again check if $_POST['active']
returned true, relying on isset()
for this was a poor coding decision in any case. Instead, it would have been better to just recheck $_POST['active']
; i.e.:
<code>if ($_POST['active']) { $postData = extractSomething($_POST);}// ...if ($_POST['active']) { echo 'post not active';}</code>
For cases, though, where it is important to check if a variable was really set (i.e., to distinguish between a variable that wasn’t set and a variable that was set to null
), the array_key_exists()
method is a much more robust solution.
For example, we could rewrite the first of the above two examples as follows:
<code>$data = fetchRecordFromStorage($storage, $identifier);if (! array_key_exists('keyShouldBeSet', $data)) { // do this if 'keyShouldBeSet' isn't set}</code>
Moreover, by combining array_key_exists()
with get_defined_vars()
, we can reliably check whether a variable within the current scope has been set or not:
<code>if (array_key_exists('varShouldBeSet', get_defined_vars())) { // variable $varShouldBeSet exists in current scope}</code>
Consider this code snippet:
<code>class Config{ private $values = []; public function getValues() { return $this->values; }}$config = new Config();$config->getValues()['test'] = 'test';echo $config->getValues()['test'];</code>
If you run the above code, you’ll get the following:
<code>PHP Notice: Undefined index: test in /path/to/my/script.php on line 21</code>
What’s wrong?
The issue is that the above code confuses returning arrays by reference with returning arrays by value. Unless you explicitly tell PHP to return an array by reference (i.e., by using&
), PHP will by default return the the array “by value”. This means that a copy of the array will be returned and therefore the called function and the caller will not be accessing the same instance of the array.
So the above call to getValues()
returns a copy of the $values
array rather than a reference to it. With that in mind, let’s revisit the two key lines from the above the example:
<code>// getValues() returns a COPY of the $values array, so this adds a 'test' element// to a COPY of the $values array, but not to the $values array itself.$config->getValues()['test'] = 'test';// getValues() again returns ANOTHER COPY of the $values array, and THIS copy doesn't// contain a 'test' element (which is why we get the "undefined index" message).echo $config->getValues()['test'];</code>
One possible fix would be to save the first copy of the $values
array returned by getValues()
and then operate on that copy subsequently; e.g.:
<code>$vals = $config->getValues();$vals['test'] = 'test';echo $vals['test'];</code>
That code will work fine (i.e., it will output test
without generating any “undefined index” message), but depending on what you’re trying to accomplish, this approach may or may not be adequate. In particular, the above code will not modify the original $values
array. So if you do want your modifications (such as adding a ‘test’ element) to affect the original array, you would instead need to modify the getValues()
function to return a reference to the $values
array itself. This is done by adding a &
before the function name, thereby indicating that it should return a reference; i.e.:
<code>class Config{ private $values = []; // return a REFERENCE to the actual $values array public function &getValues() { return $this->values; }}$config = new Config();$config->getValues()['test'] = 'test';echo $config->getValues()['test'];</code>
The output of this will be test
, as expected.
But to make things more confusing, consider instead the following code snippet:
<code>class Config{ private $values; // using ArrayObject rather than array public function __construct() { $this->values = new ArrayObject(); } public function getValues() { return $this->values; }}$config = new Config();$config->getValues()['test'] = 'test';echo $config->getValues()['test'];</code>
If you guessed that this would result in the same “undefined index” error as our earlier array
example, you were wrong. In fact, this code will work just fine. The reason is that, unlike arrays, PHP always passes objects by reference. (ArrayObject
is an SPL object, which fully mimics arrays usage, but works as an object.)
As these examples demonstrate, it is not always entirely obvious in PHP whether you are dealing with a copy or a reference. It is therefore essential to understand these default behaviors (i.e., variables and arrays are passed by value; objects are passed by reference) and also to carefully check the API documentation for the function you are calling to see if it is returning a value, a copy of an array, a reference to an array, or a reference to an object.
All that said, it is important to note that the practice of returning a reference to an array or an ArrayObject
is generally something that should be avoided, as it provides the caller with the ability to modify the instance’s private data. This “flies in the face” of encapsulation. Instead, it’s better to use old style “getters” and “setters”, e.g.:
<code>class Config{ private $values = []; public function setValue($key, $value) { $this->values[$key] = $value; } public function getValue($key) { return $this->values[$key]; }}$config = new Config();$config->setValue('testKey', 'testValue');echo $config->getValue('testKey'); // echos 'testValue'</code>
This approach gives the caller the ability to set or get any value in the array without providing public access to the otherwise-private $values
array itself.
It’s not uncommon to come across something like this if your PHP is not working:
<code>$models = [];foreach ($inputValues as $inputValue) { $models[] = $valueRepository->findByValue($inputValue);}</code>
While there may be absolutely nothing wrong here, but if you follow the logic in the code, you may find that the innocent looking call above to $valueRepository->findByValue()
ultimately results in a query of some sort, such as:
<code>$result = $connection->query("SELECT `x`,`y` FROM `values` WHERE `value`=" . $inputValue);</code>
As a result, each iteration of the above loop would result in a separate query to the database. So if, for example, you supplied an array of 1,000 values to the loop, it would generate 1,000 separate queries to the resource! If such a script is called in multiple threads, it could potentially bring the system to a grinding halt.
It’s therefore crucial to recognize when queries are being made by your code and, whenever possible, gather the values and then run one query to fetch all the results.
One example of a fairly common place to encounter querying being done inefficiently (i.e., in a loop) is when a form is posted with a list of values (IDs, for example). Then, to retrieve the full record data for each of the IDs, the code will loop through the array and do a separate SQL query for each ID. This will often look something like this:
<code>$data = [];foreach ($ids as $id) { $result = $connection->query("SELECT `x`, `y` FROM `values` WHERE `id` = " . $id); $data[] = $result->fetch_row();}</code>
But the same thing can be accomplished much more efficiently in a single SQL query as follows:
<code>$data = [];if (count($ids)) { $result = $connection->query("SELECT `x`, `y` FROM `values` WHERE `id` IN (" . implode(',', $ids)); while ($row = $result->fetch_row()) { $data[] = $row; }}</code>
It’s therefore crucial to recognize when queries are being made, either directly or indirectly, by your code. Whenever possible, gather the values and then run one query to fetch all the results. Yet caution must be exercised there as well, which leads us to our next common PHP mistake…
While fetching many records at once is definitely more efficient than running a single query for each row to fetch, such an approach can potentially lead to an “out of memory” condition in libmysqlclient
when using PHP’s mysql
extension.
To demonstrate, let’s take a look at a test box with limited resources (512MB RAM), MySQL, and php-cli
.
We’ll bootstrap a database table like this:
<code>// connect to mysql$connection = new mysqli('localhost', 'username', 'password', 'database');// create table of 400 columns$query = 'CREATE TABLE `test`(`id` INT NOT NULL PRIMARY KEY AUTO_INCREMENT';for ($col = 0; $col < 400; $col++) { $query .= ", `col$col` CHAR(10) NOT NULL";}$query .= ');';$connection->query($query);// write 2 million rowsfor ($row = 0; $row < 2000000; $row++) { $query = "INSERT INTO `test` VALUES ($row"; for ($col = 0; $col < 400; $col++) { $query .= ', ' . mt_rand(1000000000, 9999999999); } $query .= ')'; $connection->query($query);}</code>
OK, now let’s check resources usage:
<code>// connect to mysql$connection = new mysqli('localhost', 'username', 'password', 'database');echo "Before: " . memory_get_peak_usage() . "\n";$res = $connection->query('SELECT `x`,`y` FROM `test` LIMIT 1');echo "Limit 1: " . memory_get_peak_usage() . "\n";$res = $connection->query('SELECT `x`,`y` FROM `test` LIMIT 10000');echo "Limit 10000: " . memory_get_peak_usage() . "\n";</code>
Output:
<code>Before: 224704Limit 1: 224704Limit 10000: 224704</code>
Cool. Looks like the query is safely managed internally in terms of resources.
Just to be sure, though, let’s boost the limit one more time and set it to 100,000. Uh-oh. When we do that, we get:
<code>PHP Warning: mysqli::query(): (HY000/2013): Lost connection to MySQL server during query in /root/test.php on line 11</code>
What happened?
The issue here is the way PHP’s mysql
module works. It’s really just a proxy for libmysqlclient
, which does the dirty work. When a portion of data is selected, it goes directly into memory. Since this memory is not managed by PHP’s manager, memory_get_peak_usage()
won’t show any increase in resources utilization as we up the limit in our query. This leads to problems like the one demonstrated above where we’re tricked into complacency thinking that our memory management is fine. But in reality, our memory management is seriously flawed and we can experience problems like the one shown above.
You can at least avoid the above headfake (although it won’t itself improve your memory utilization) by instead using the mysqlnd
module. mysqlnd
is compiled as a native PHP extension and it does use PHP’s memory manager.
Therefore, if we run the above test using mysqlnd
rather than mysql
, we get a much more realistic picture of our memory utilization:
<code>Before: 232048Limit 1: 324952Limit 10000: 32572912</code>
And it’s even worse than that, by the way. According to PHP documentation, mysql
uses twice as many resources as mysqlnd
to store data, so the original script using mysql
really used even more memory than shown here (roughly twice as much).
To avoid such problems, consider limiting the size of your queries and using a loop with small number of iterations; e.g.:
<code>$totalNumberToFetch = 10000;$portionSize = 100;for ($i = 0; $i <= ceil($totalNumberToFetch / $portionSize); $i++) { $limitFrom = $portionSize * $i; $res = $connection->query( "SELECT `x`,`y` FROM `test` LIMIT $limitFrom, $portionSize");}</code>
When we consider both this PHP mistake and mistake #4 above, we realize that there is a healthy balance that your code ideally needs to achieve between, on the one hand, having your queries being too granular and repetitive, vs. having each of your individual queries be too large. As is true with most things in life, balance is needed; either extreme is not good and can cause problems with PHP not working properly.
In some sense, this is really more of an issue in PHP itself than something you would run into while debugging PHP, but it has never been adequately addressed. PHP 6’s core was to be made Unicode-aware, but that was put on hold when development of PHP 6 was suspended back in 2010.
But that by no means absolves the developer from properly handing UTF-8 and avoiding the erroneous assumption that all strings will necessarily be “plain old ASCII”. Code that fails to properly handle non-ASCII strings is notorious for introducing gnarly heisenbugs into your code. Even simple strlen($_POST['name'])
calls could cause problems if someone with a last name like “Schrödinger” tried to sign up into your system.
Here’s a small checklist to avoid such problems in your code:
mb_*
functions instead of the old string functions (make sure the “multibyte” extension is included in your PHP build).latin1
by default).json_encode()
converts non-ASCII symbols (e.g., “Schrödinger” becomes “Schr\u00f6dinger”) but serialize()
does not.A particularly valuable resource in this regard is the UTF-8 Primer for PHP and MySQL post by Francisco Claria on this blog.
$_POST
will always contain your POST dataDespite its name, the $_POST
array won’t always contain your POST data and can be easily found empty. To understand this, let’s take a look at an example. Assume we make a server request with a jQuery.ajax()
call as follows:
<code>// js$.ajax({ url: 'http://my.site/some/path', method: 'post', data: JSON.stringify({a: 'a', b: 'b'}), contentType: 'application/json'});</code>
(Incidentally, note the contentType: 'application/json'
here. We send data as JSON, which is quite popular for APIs. It’s the default, for example, for posting in the AngularJS $http
service.)
On the server side of our example, we simply dump the $_POST
array:
<code>// phpvar_dump($_POST);</code>
Surprisingly, the result will be:
<code>array(0) { }</code>
Why? What happened to our JSON string {a: 'a', b: 'b'}
?
The answer is that PHP only parses a POST payload automatically when it has a content type of application/x-www-form-urlencoded
or multipart/form-data
. The reasons for this are historical — these two content types were essentially the only ones used years ago when PHP’s $_POST
was implemented. So with any other content type (even those that are quite popular today, like application/json
), PHP doesn’t automatically load the POST payload.
Since $_POST
is a superglobal, if we override it once (preferably early in our script), the modified value (i.e., including the POST payload) will then be referenceable throughout our code. This is important since $_POST
is commonly used by PHP frameworks and almost all custom scripts to extract and transform request data.
So, for example, when processing a POST payload with a content type of application/json
, we need to manually parse the request contents (i.e., decode the JSON data) and override the $_POST
variable, as follows:
<code>// php$_POST = json_decode(file_get_contents('php://input'), true);</code>
Then when we dump the $_POST
array, we see that it correctly includes the POST payload; e.g.:
<code>array(2) { ["a"]=> string(1) "a" ["b"]=> string(1) "b" }</code>
Look at this sample piece of code and try guessing what it will print:
<code>for ($c = 'a'; $c <= 'z'; $c++) { echo $c . "\n";}</code>
If you answered ‘a’ through ‘z’, you may be surprised to know that you were wrong.
Yes, it will print ‘a’ through ‘z’, but then it will also print ‘aa’ through ‘yz’. Let’s see why.
In PHP there’s no char
datatype; only string
is available. With that in mind, incrementing the string
z
in PHP yields aa
:
<code>php> $c = 'z'; echo ++$c . "\n";aa</code>
Yet to further confuse matters, aa
is lexicographically less than z
:
<code>php> var_export((boolean)('aa' < 'z')) . "\n";true</code>
That’s why the sample code presented above prints the letters a
through z
, but then also prints aa
throughyz
. It stops when it reachs za
, which is the first value it encounters that it “greater than” z
:
<code>php> var_export((boolean)('za' < 'z')) . "\n";false</code>
That being the case, here’s one way to properly loop through the values ‘a’ through ‘z’ in PHP:
<code>for ($i = ord('a'); $i <= ord('z'); $i++) { echo chr($i) . "\n";}</code>
Or alternatively:
<code>$letters = range('a', 'z');for ($i = 0; $i < count($letters); $i++) { echo $letters[$i] . "\n";}</code>
Although ignoring coding standards doesn’t directly lead to needing to debug PHP code, it is still probably one of the most important things to discuss here.
Ignoring coding standards can cause a whole slew of problems on a project. At best, it results in code that is inconsistent (since every developer is “doing their own thing”). But at worst, it produces PHP code that does not work or can be difficult (sometimes almost impossible) to navigate, making it extremely difficult to debug, enhance, maintain. And that means reduced productivity for your team, including lots of wasted (or at least unnecessary) effort.
Fortunately for PHP developers, there is the PHP Standards Recommendation (PSR), comprised of the following five standards:
PSR was originally created based on inputs from maintainers of the most recognized platforms on the market. Zend, Drupal, Symfony, Joomla and others contributed to these standards, and are now following them. Even PEAR, which attempted to be a standard for years before that, participates in PSR now.
In some sense, it almost doesn’t matter what your coding standard is, as long as you agree on a standard and stick to it, but following the PSR is generally a good idea unless you have some compelling reason on your project to do otherwise. More and more teams and projects are conforming with the PSR. Tt’s definitely recognized at this point as “the” standard by the majority of PHP developers, so using it will help ensure that new developers are familiar and comfortable with your coding standard when they join your team.
empty()
Some PHP developers like using empty()
for boolean checks for just about everything. There are case, though, where this can lead to confusion.
First, let’s come back to arrays and ArrayObject
instances (which mimic arrays). Given their similarity, it’s easy to assume that arrays and ArrayObject
instances will behave identically. This proves, however, to be a dangerous assumption. For example, in PHP 5.0:
<code>// PHP 5.0 or later:$array = [];var_dump(empty($array)); // outputs bool(true) $array = new ArrayObject();var_dump(empty($array)); // outputs bool(false)// why don't these both produce the same output?</code>
And to make matters even worse, the results would have been different prior to PHP 5.0:
<code>// Prior to PHP 5.0:$array = [];var_dump(empty($array)); // outputs bool(false) $array = new ArrayObject();var_dump(empty($array)); // outputs bool(false)</code>
This approach is unfortunately quite popular. For example, this is the way Zend\Db\TableGateway
of Zend Framework 2 returns data when calling current()
on TableGateway::select()
result as the doc suggests. Developer can easily become victim of this mistake with such data.
To avoid these issues, the better approach to checking for empty array structures is to use count()
:
<code>// Note that this work in ALL versions of PHP (both pre and post 5.0):$array = [];var_dump(count($array)); // outputs int(0)$array = new ArrayObject();var_dump(count($array)); // outputs int(0)</code>
And incidentally, since PHP casts 0
to false
, count()
can also be used within if ()
conditions to check for empty arrays. It’s also worth noting that, in PHP, count()
is constant complexity (O(1)
operation) on arrays, which makes it even clearer that it’s the right choice.
Another example when empty()
can be dangerous is when combining it with the magic class function __get()
. Let’s define two classes and have a test
property in both.
First let’s define a Regular
class that includes test
as a normal property:
<code>class Regular{ public $test = 'value';}</code>
Then let’s define a Magic
class that uses the magic __get()
operator to access its test
property:
<code>class Magic{ private $values = ['test' => 'value']; public function __get($key) { if (isset($this->values[$key])) { return $this->values[$key]; } }}</code>
OK, now let’s see what happens when we attempt to access the test
property of each of these classes:
<code>$regular = new Regular();var_dump($regular->test); // outputs string(4) "value"$magic = new Magic();var_dump($magic->test); // outputs string(4) "value"</code>
Fine so far.
But now let’s see what happens when we call empty()
on each of these:
<code>var_dump(empty($regular->test)); // outputs bool(false)var_dump(empty($magic->test)); // outputs bool(true)</code>
Ugh. So if we rely on empty()
, we can be misled into believing that the test
property of $magic
is empty, whereas in reality it is set to 'value'
.
Unfortunately, if a class uses the magic __get()
function to retrieve a property’s value, there’s no foolproof way to check if that property value is empty or not. Outside of the class’ scope, you can really only check if a null
value will be returned, and that doesn’t necessarily mean that the corresponding key is not set, since it actually could have been set to null
.
In contrast, if we attempt to reference a non-existent property of a Regular
class instance, we will get a notice similar to the following:
<code>Notice: Undefined property: Regular::$nonExistantTest in /path/to/test.php on line 10Call Stack: 0.0012 234704 1. {main}() /path/to/test.php:0</code>
So the main point here is that the empty()
method should be used with care as it can lend itself to confusing – or even potentially misleading – results, if one is not careful.
PHP’s ease of use can lull developers into a false sense of comfort, leaving themselves vulnerable to lengthy PHP debugging due to some of the nuances and idiosyncrasies of the language. This can result in PHP not working and problems such as those described herein.
The PHP language has evolved significantly over the course of its 20 year history. Familiarizing oneself with its subtleties is a worthwhile endeavor, as it will help ensure that the software you produce is more scalable, robust, and maintainable.