ホームページ  >  記事  >  バックエンド開発  >  PHP ジェネレーターGenerators_PHP チュートリアル

PHP ジェネレーターGenerators_PHP チュートリアル

WBOY
WBOYオリジナル
2016-07-13 10:29:41934ブラウズ

以下のファイルを 1 行ずつ読み取る最初の例は、通常のメソッド、イテレーター、ジェネレーターの 3 つの方法で実装されており、サポートされている PHP バージョンに引用することができます。 (PHP 5 >= 5.5.0)

次の収量の説明を理解するには、一行ずつ翻訳する必要があります

コメント募集: ジェネレーター

  • 日付: 2012-06-05
  • 著者: ニキータ・ポポフ nikic@php.net
  • ステータス: 実装済み

はじめに

ジェネレーターは、反復子を実装する簡単で定型的な方法を提供します。

例として、ユーザーランドコードでfile()関数を実装する方法を考えてみましょう:

リーリー

この種のコードの主な欠点は明らかです。ファイルの大きさによっては、ファイル全体が大きな配列に読み取られるため、これは通常必要なことではありません。行を 1 つずつ取得するには、イテレーターが最適です。

残念ながら、イテレータの実装には膨大な量の定型コードが必要です。たとえば、上記の関数のイテレータの変形を考えてみましょう。 リーリー

ご覧のとおり、非常に単純なコードがイテレーターに変換されると非常に複雑になる可能性があります。ジェネレーターを使用すると、この問題が解決され、非常に簡単な方法でイテレーターを実装できるようになります。 リーリー

コードは配列ベースの実装と非常によく似ていますが、主な違いは、値を配列にプッシュするのではなく、値が

処理されることです。

ジェネレーターは、ジェネレーターと呼び出しコードの間で制御をやり取りすることによって機能します。yield

最初にジェネレーター関数を呼び出すとき (

ループ:

)

Iterator::next() メソッドが呼び出されるたびに、PHP は $lines = getLinesFromFile($fileName)) the passed argument is bound, but nothing of the code is actually executed. Instead the function directly returns a Generator object. That Generator object implements the Iterator interface and is what is eventually traversed by the foreach 式に到達するまでジェネレーター関数の実行を再開します。その

式の値は Iterator::current( ) が返されます。

Iterator::next() method is called PHP resumes the execution of the generator function until it hits a yield expression. The value of that yield expression is what Iterator::current()Generator メソッドを

インターフェースと組み合わせて使用​​すると、トラバース可能なクラスを簡単に実装することもできます:

リーリー IteratorAggregateジェネレーターは逆に使用することもできます。つまり、値を生成する代わりに、値を消費することもできます。この方法で使用される場合、ジェネレーターは、拡張ジェネレーター、リバース ジェネレーター、またはコルーチンと呼ばれます。

コルーチンはかなり高度な概念であるため、あまり不自然でない短い例を見つけるのは非常に困難です。詳しくは、コルーチンを使用してストリーミング XML を解析する方法の例を参照してください。このテーマに関するプレゼンテーション

仕様

ジェネレーター関数の認識

ステートメントを含む関数はすべて、自動的にジェネレーター関数になります。

初期の実装では、ジェネレーター関数がアスタリスク修飾子 (yield) でマークされる必要がありました。この方法には、ジェネレーターがより明示的であり、イールドレス コルーチンも可能になるという利点があります。

次の理由により、アステリックス修飾子ではなく自動検出が選択されました:

function*

HipHop PHP には自動検出を使用する既存のジェネレーター実装があり、アステリックス修飾子を使用すると互換性が失われます。

(私が知っている) 他の言語での既存のジェネレーター実装もすべて自動検出を使用します。これには Python、JavaScript 1.7、C# が含まれます。唯一の例外は、ECMAScript Harmony で定義されているジェネレーターのサポートですが、実際に実装しているブラウザーは知りません。定義された方法でそれを行います。
  • 参照による譲歩の構文は非常に醜く見えます:
  • イールドレス コルーチンは非常に限定されたユースケースであり、
  • のようなコードを使用した自動検出でも可能です。 function *&gen()
  • 基本的な動作if (false) yield;
  • ジェネレーター関数が呼び出されると、パラメーターのバインド直後に実行が一時停止され、
オブジェクトが返されます。

オブジェクトは次のインターフェースを実装します:

final <span class="kw2">class</span> Generator implements Iterator <span class="br0">{</span>
    void  <span class="kw3">rewind</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
    bool  valid<span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
    mixed <span class="kw3">current</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
    mixed <span class="kw3">key</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
    void  <span class="kw3">next</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
 
    mixed send<span class="br0">(</span>mixed <span class="re0">$value</span><span class="br0">)</span><span class="sy0">;</span>
    mixed <span class="kw1">throw</span><span class="br0">(</span>Exception <span class="re0">$exception</span><span class="br0">)</span><span class="sy0">;</span>
<span class="br0">}</span>

If the generator is not yet at a yield statement (i.e. was just created and not yet used as an iterator), then any call to rewind, valid, current, key, next or send will resume the generator until the next yield statement is hit.

Consider this example:

<span class="kw2">function</span> gen<span class="br0">(</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="kw1">echo</span> <span class="st_h">'start'</span><span class="sy0">;</span>
    yield <span class="st_h">'middle'</span><span class="sy0">;</span>
    <span class="kw1">echo</span> <span class="st_h">'end'</span><span class="sy0">;</span>
<span class="br0">}</span>
 
<span class="co1">// Initial call does not output anything</span>
<span class="re0">$gen</span> <span class="sy0">=</span> gen<span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
 
<span class="co1">// Call to current() resumes the generator, thus "start" is echo'd.</span>
<span class="co1">// Then the yield expression is hit and the string "middle" is returned</span>
<span class="co1">// as the result of current() and then echo'd.</span>
<span class="kw1">echo</span> <span class="re0">$gen</span><span class="sy0">-></span><span class="kw3">current</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
 
<span class="co1">// Execution of the generator is resumed again, thus echoing "end"</span>
<span class="re0">$gen</span><span class="sy0">-></span><span class="kw3">next</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>

A nice side-effect of this behavior is that coroutines do not have to be primed with a next() call before they can be used. (This is required in Python and also the reason why coroutines in Python usually use some kind of decorator that automatically primes the coroutine.)

Apart from the above the Generator methods behave as follows:

  • rewind: Throws an exception if the generator is currently after the first yield. (More in the “Rewinding a generator” section.)
  • valid: Returns false if the generator has been closed, true otherwise. (More in the “Closing a generator” section.)
  • current: Returns whatever was passed to yield or null if nothing was passed or the generator is already closed.
  • key: Returns the yielded key or, if none was specified, an auto-incrementing key or null if the generator is already closed. (More in the “Yielding keys” section.)
  • next: Resumes the generator (unless the generator is already closed).
  • send: Sets the return value of the yield expression and resumes the generator (unless the generator is already closed). (More in the “Sending values” section.)
  • throw: Throws an exception at the current suspension point in the generator. (More in the “Throwing into the generator” section.)

Yield syntax

The newly introduced yield keyword (T_YIELD) is used both for sending and receiving values inside the generator. There are three basic forms of the yield expression:

  • yield $key => $value: Yields the value $value with key $key.
  • yield $value: Yields the value $value with an auto-incrementing integer key.
  • yield: Yields the value null with an auto-incrementing integer key.

The return value of the yield expression is whatever was sent to the generator using send(). If nothing was sent (e.g. during foreach iteration) null is returned.

To avoid ambiguities the first two yield expression types have to be surrounded by parenthesis when used in expression-context. Some examples when parentheses are necessary and when they aren't:

<span class="co1">// these three are statements, so they don't need parenthesis</span>
yield <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="sy0">;</span>
yield <span class="re0">$value</span><span class="sy0">;</span>
yield<span class="sy0">;</span>
 
<span class="co1">// these are expressions, so they require parenthesis</span>
<span class="re0">$data</span> <span class="sy0">=</span> <span class="br0">(</span>yield <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="br0">)</span><span class="sy0">;</span>
<span class="re0">$data</span> <span class="sy0">=</span> <span class="br0">(</span>yield <span class="re0">$value</span><span class="br0">)</span><span class="sy0">;</span>
 
<span class="co1">// to avoid strange (yield) syntax the parenthesis are not required here</span>
<span class="re0">$data</span> <span class="sy0">=</span> yield<span class="sy0">;</span>

If yield is used inside a language construct that already has native parentheses, then they don't have to be duplicated:

call<span class="br0">(</span>yield <span class="re0">$value</span><span class="br0">)</span><span class="sy0">;</span>
<span class="co1">// instead of</span>
call<span class="br0">(</span><span class="br0">(</span>yield <span class="re0">$value</span><span class="br0">)</span><span class="br0">)</span><span class="sy0">;</span>
 
<span class="kw1">if</span> <span class="br0">(</span>yield <span class="re0">$value</span><span class="br0">)</span> <span class="br0">{</span> <span class="sy0">...</span> <span class="br0">}</span>
<span class="co1">// instead of</span>
<span class="kw1">if</span> <span class="br0">(</span><span class="br0">(</span>yield <span class="re0">$value</span><span class="br0">)</span><span class="br0">)</span> <span class="br0">{</span> <span class="sy0">...</span> <span class="br0">}</span>

The only exception is the array() structure. Not requiring parenthesis would be ambiguous here:

<span class="kw3">array</span><span class="br0">(</span>yield <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="br0">)</span>
<span class="co1">// can be either</span>
<span class="kw3">array</span><span class="br0">(</span><span class="br0">(</span>yield <span class="re0">$key</span><span class="br0">)</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="br0">)</span>
<span class="co1">// or</span>
<span class="kw3">array</span><span class="br0">(</span><span class="br0">(</span>yield <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="br0">)</span><span class="br0">)</span>

Python also has parentheses requirements for expression-use of yield. The only difference is that Python also requires parentheses for a value-less yield (because the language does not use semicolons).

See also the "Alternative yield syntax considerations" section.

Yielding keys

The languages that currently implement generators don't have support for yielding keys (only values). This though is just a side-effect as these languages don't support keys in iterators in general.

In PHP on the other hand keys are explicitly part of the iteration process and it thus does not make sense to not add key-yielding support. The syntax could be analogous to that of foreach loops and array declarations:

yield <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="sy0">;</span>

Furthermore generators need to generate keys even if no key was explicitly yielded. In this case it seems reasonable to behave the same as arrays do: Start with the key 0 and always increment by one. If in between an integer key which is larger than the current auto-key is explicitly yielded, then that will be used as the starting point for new auto-keys. All other yielded keys do not affect the auto-key mechanism.

<span class="kw2">function</span> gen<span class="br0">(</span><span class="br0">)</span> <span class="br0">{</span>
    yield <span class="st_h">'a'</span><span class="sy0">;</span>
    yield <span class="st_h">'b'</span><span class="sy0">;</span>
    yield <span class="st_h">'key'</span> <span class="sy0">=></span> <span class="st_h">'c'</span><span class="sy0">;</span>
    yield <span class="st_h">'d'</span><span class="sy0">;</span>
    yield <span class="nu0">10</span> <span class="sy0">=></span> <span class="st_h">'e'</span><span class="sy0">;</span>
    yield <span class="st_h">'f'</span><span class="sy0">;</span>
<span class="br0">}</span>
 
<span class="kw1">foreach</span> <span class="br0">(</span>gen<span class="br0">(</span><span class="br0">)</span> <span class="kw1">as</span> <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="kw1">echo</span> <span class="re0">$key</span><span class="sy0">,</span> <span class="st_h">' => '</span><span class="sy0">,</span> <span class="re0">$value</span><span class="sy0">,</span> <span class="st0">"<span class="es1">\n</span>"</span><span class="sy0">;</span>
<span class="br0">}</span>
 
<span class="co1">// outputs:</span>
<span class="nu0">0</span> <span class="sy0">=></span> a
<span class="nu0">1</span> <span class="sy0">=></span> b
<span class="kw3">key</span> <span class="sy0">=></span> c
<span class="nu0">2</span> <span class="sy0">=></span> d
<span class="nu0">10</span> <span class="sy0">=></span> e
<span class="nu0">11</span> <span class="sy0">=></span> f

This is the same behavior that arrays have (i.e. if gen() instead simply returned an array with the yielded values the keys would be same). The only difference occurs when the generator yield non-integer, but numeric keys. For arrays they are cast, for generators the are not.

Yield by reference

Generators can also yield by values by reference. To do so the & modifier is added before the function name, just like it is done for return by reference.

This for example allows you to create classes with by-ref iteration behavior (which is something that is completely impossible with normal iterators):

class DataContainer implements IteratorAggregate {
    protected $data;
 
    public function __construct(array $data) {
        $this->data = $data;
    }
 
    public function &getIterator() {
        foreach ($this->data as $key => &$value) {
            yield <span class="re0">$key</span> <span class="sy0">=></span> <span class="re0">$value</span><span class="sy0">;</span>
        }
    }
}

The class can then be iterated using by-ref foreach:

<span class="re0">$dataContainer</span> <span class="sy0">=</span> <span class="kw2">new</span> DataContainer<span class="br0">(</span><span class="br0">[</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">2</span><span class="sy0">,</span> <span class="nu0">3</span><span class="br0">]</span><span class="br0">)</span><span class="sy0">;</span>
<span class="kw1">foreach</span> <span class="br0">(</span><span class="re0">$dataContainer</span> <span class="kw1">as</span> <span class="sy0">&</span><span class="re0">$value</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="re0">$value</span> <span class="sy0">*=</span> <span class="sy0">-</span><span class="nu0">1</span><span class="sy0">;</span>
<span class="br0">}</span>
 
<span class="co1">// $this->data is now [-1, -2, -3]</span>

Only generators specifying the & modifier can be iterated by ref. If you try to iterate a non-ref generator by-ref an E_ERROR is thrown.

Sending values

Values can be sent into a generator using the send() method. send($value) will set $value as the return value of the current yield expression and resume the generator. When the generator hits another yield expression the yielded value will be the return value of send(). This is just a convenience feature to save an additional call to current().

Values are always sent by-value. The reference modifier & only affects yielded values, not the ones sent back to the coroutine.

A simple example of sending values: Two (interchangeable) logging implementations:

<span class="kw2">function</span> echoLogger<span class="br0">(</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="kw1">while</span> <span class="br0">(</span><span class="kw4">true</span><span class="br0">)</span> <span class="br0">{</span>
        <span class="kw1">echo</span> <span class="st_h">'Log: '</span> <span class="sy0">.</span> yield <span class="sy0">.</span> <span class="st0">"<span class="es1">\n</span>"</span><span class="sy0">;</span>
    <span class="br0">}</span>
<span class="br0">}</span>
 
<span class="kw2">function</span> fileLogger<span class="br0">(</span><span class="re0">$fileName</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="re0">$fileHandle</span> <span class="sy0">=</span> <span class="kw3">fopen</span><span class="br0">(</span><span class="re0">$fileName</span><span class="sy0">,</span> <span class="st_h">'a'</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="kw1">while</span> <span class="br0">(</span><span class="kw4">true</span><span class="br0">)</span> <span class="br0">{</span>
        <span class="kw3">fwrite</span><span class="br0">(</span><span class="re0">$fileHandle</span><span class="sy0">,</span> yield <span class="sy0">.</span> <span class="st0">"<span class="es1">\n</span>"</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="br0">}</span>
<span class="br0">}</span>
 
<span class="re0">$logger</span> <span class="sy0">=</span> echoLogger<span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
<span class="co1">// or</span>
<span class="re0">$logger</span> <span class="sy0">=</span> fileLogger<span class="br0">(</span>__DIR__ <span class="sy0">.</span> <span class="st_h">'/log'</span><span class="br0">)</span><span class="sy0">;</span>
 
<span class="re0">$logger</span><span class="sy0">-></span><span class="me1">send</span><span class="br0">(</span><span class="st_h">'Foo'</span><span class="br0">)</span><span class="sy0">;</span>
<span class="re0">$logger</span><span class="sy0">-></span><span class="me1">send</span><span class="br0">(</span><span class="st_h">'Bar'</span><span class="br0">)</span><span class="sy0">;</span>

Throwing into the generator

Exceptions can be thrown into the generator using the Generator::throw() method. This will throw an exception in the generator's execution context and then resume the generator. It is roughly equivalent to replacing the current yield expression with a throw statement and resuming then. If the generator is already closed the exception will be thrown in the callers context instead (which is equivalent to replacing the throw() call with a throw statement). The throw() method will return the next yielded value (if the exception is caught and no other exception is thrown).

An example of the functionality:

<span class="kw2">function</span> gen<span class="br0">(</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="kw1">echo</span> <span class="st0">"Foo<span class="es1">\n</span>"</span><span class="sy0">;</span>
    try <span class="br0">{</span>
        yield<span class="sy0">;</span>
    <span class="br0">}</span> catch <span class="br0">(</span>Exception <span class="re0">$e</span><span class="br0">)</span> <span class="br0">{</span>
        <span class="kw1">echo</span> <span class="st0">"Exception: {<span class="es4">$e->getMessage</span>()}<span class="es1">\n</span>"</span><span class="sy0">;</span>
    <span class="br0">}</span>
    <span class="kw1">echo</span> <span class="st0">"Bar<span class="es1">\n</span>"</span><span class="sy0">;</span>
<span class="br0">}</span>
 
<span class="re0">$gen</span> <span class="sy0">=</span> gen<span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
<span class="re0">$gen</span><span class="sy0">-></span><span class="kw3">rewind</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>                     <span class="co1">// echos "Foo"</span>
<span class="re0">$gen</span><span class="sy0">-></span><span class="kw1">throw</span><span class="br0">(</span><span class="kw2">new</span> Exception<span class="br0">(</span><span class="st_h">'Test'</span><span class="br0">)</span><span class="br0">)</span><span class="sy0">;</span> <span class="co1">// echos "Exception: Test"</span>
                                    <span class="co1">// and "Bar"</span>

Rewinding a generator

Rewinding to some degree goes against the concept of generators, as they are mainly intended as one-time data sources that are not supposed to be iterated another time. On the other hand, most generators probably *are* rewindable and it might make sense to allow it. One could argue though that rewinding a generator is really bad practice (especially if the generator is doing some expensive calculation). Allowing it to rewind would look like it is a cheap operation, just like with arrays. Also rewinding (as in jumping back to the execution context state at the initial call to the generator) can lead to unexpected behavior, e.g. in the following case:

<span class="kw2">function</span> getSomeStuff<span class="br0">(</span>PDOStatement <span class="re0">$stmt</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="kw1">foreach</span> <span class="br0">(</span><span class="re0">$stmt</span> <span class="kw1">as</span> <span class="re0">$row</span><span class="br0">)</span> <span class="br0">{</span>
        yield doSomethingWith<span class="br0">(</span><span class="re0">$row</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="br0">}</span>
<span class="br0">}</span>

Here rewinding would simply result in an empty iterator as the result set is already depleted.

For the above reasons generators will not support rewinding. The rewind method will throw an exception, unless the generator is currently before or at the first yield. This results in the following behavior:

<span class="re0">$gen</span> <span class="sy0">=</span> createSomeGenerator<span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
 
<span class="co1">// the rewind() call foreach is doing here is okay, because</span>
<span class="co1">// the generator is before the first yield</span>
<span class="kw1">foreach</span> <span class="br0">(</span><span class="re0">$gen</span> <span class="kw1">as</span> <span class="re0">$val</span><span class="br0">)</span> <span class="br0">{</span> <span class="sy0">...</span> <span class="br0">}</span>
 
<span class="co1">// the rewind() call of a second foreach loop on the other hand</span>
<span class="co1">// throws an exception</span>
<span class="kw1">foreach</span> <span class="br0">(</span><span class="re0">$gen</span> <span class="kw1">as</span> <span class="re0">$val</span><span class="br0">)</span> <span class="br0">{</span> <span class="sy0">...</span> <span class="br0">}</span>

So basically calling rewind is only allowed if it wouldn't do anything (because the generator is already at its initial state). After that an exception is thrown, so accidentally reused generators are easy to find.

Cloning a generator

Generators cannot be cloned.

Support for cloning was included in the initial version, but removed in PHP 5.5 Beta 3 due to implementational difficulties, unclear semantics and no particularly convincing use cases.

Closing a generator

When a generator is closed it frees the suspended execution context (as well as all other held variables). After it has been closed valid will return false and both current and key will return null.

A generator can be closed in two ways:

  • Reaching a return statement (or the end of the function) in a generator or throwing an exception from it (without catching it inside the generator).
  • Removing all references to the generator object. In this case the generator will be closed as part of the garbage collection process.

If the generator contains (relevant) finally blocks those will be run. If the generator is force-closed (i.e. by removing all references) then it is not allowed to use yield in the finally clause (a fatal error will be thrown). In all other cases yield is allowed in finally blocks.

The following resources are destructed while closing a generator:

  • The current execution context (execute_data)
  • Stack arguments for the generator call, and the additional execution context which is used to manage them.
  • The currently active symbol table (or the compiled variables if no symbol table is in use).
  • The current $this object.
  • If the generator is closed during a method call, the object which the method is invoked on (EX(object)).
  • If the generator is closed during a call, the arguments pushed to the stack.
  • Any foreach loop variables which are still alive (taken from brk_cont_array).
  • The current generator key and value

Currently it can happen that temporary variables are not cleaned up properly in edge-case situations. Exceptions are also subject to this problem: https://bugs.php.net/bug.php?id=62210. If that bug could be fixed for exceptions, then it would also be fixed for generators.

Error conditions

This is a list of generators-related error conditions:

  • Using yield outside a function: E_COMPILE_ERROR
  • Using return with a value inside a generator: E_COMPILE_ERROR
  • Manual construction of Generator class: E_RECOVERABLE_ERROR (analogous to Closure behavior)
  • Yielding a key that isn't an integer or a key: E_ERROR (this is just a placeholder until Etienne's arbitrary-keys patch lands)
  • Trying to iterate a non-ref generator by-ref: Exception
  • Trying to traverse an already closed generator: Exception
  • Trying to rewind a generator after the first yield: Exception
  • Yielding a temp/const value by-ref: E_NOTICE (analogous to return behavior)
  • Yielding a string offset by-ref: E_ERROR (analogous to return behavior)
  • Yielding a by-val function return value by-ref: E_NOTICE (analogous to return behavior)

This list might not be exhaustive.

Performance

You can find a small micro benchmark at https://gist.github.com/2975796. It compares several ways of iterating ranges:

  • Using generators (xrange)
  • Using iterators (RangeIterator)
  • Using arrays implemented in userland (urange)
  • Using arrays implemented internally (range)

For large ranges generators are consistently faster; about four times faster than an iterator implementation and even 40% faster than the native range implementation.

For small ranges (around one hundred elements) the variance of the results is rather high, but from multiple runs it seems that in this case generators are slightly slower than the native implementation, but still faster than the iterator variant.

The tests were run on a Ubuntu VM, so I'm not exactly sure how representative they are.

Some points from the discussion

 

Why not just use callback functions?

A question that has come up a few times during discussion: Why not use callback functions, instead of generators? For example the above getLinesFromFile function could be rewritten using a callback:

<span class="kw2">function</span> processLinesFromFile<span class="br0">(</span><span class="re0">$fileName</span><span class="sy0">,</span> callable <span class="re0">$callback</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="kw1">if</span> <span class="br0">(</span><span class="sy0">!</span><span class="re0">$fileHandle</span> <span class="sy0">=</span> <span class="kw3">fopen</span><span class="br0">(</span><span class="re0">$fileName</span><span class="sy0">,</span> <span class="st_h">'r'</span><span class="br0">)</span><span class="br0">)</span> <span class="br0">{</span>
        <span class="kw1">return</span><span class="sy0">;</span>
    <span class="br0">}</span>
 
    <span class="kw1">while</span> <span class="br0">(</span><span class="kw4">false</span> <span class="sy0">!==</span> <span class="re0">$line</span> <span class="sy0">=</span> <span class="kw3">fgets</span><span class="br0">(</span><span class="re0">$fileHandle</span><span class="br0">)</span><span class="br0">)</span> <span class="br0">{</span>
        <span class="re0">$callback</span><span class="br0">(</span><span class="re0">$line</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="br0">}</span>
 
    <span class="kw3">fclose</span><span class="br0">(</span><span class="re0">$fileHandle</span><span class="br0">)</span><span class="sy0">;</span>
<span class="br0">}</span>
 
processLinesFromFile<span class="br0">(</span><span class="re0">$fileName</span><span class="sy0">,</span> <span class="kw2">function</span><span class="br0">(</span><span class="re0">$line</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="co1">// do something</span>
<span class="br0">}</span><span class="br0">)</span><span class="sy0">;</span>

This approach has two main disadvantages:

Firstly, callbacks integrate badly into the existing PHP coding paradigms. Having quadruply-nested closures is something very normal in languages like JavaScript, but rather rare in PHP. Many things in PHP are based on iteration and generators can nicely integrate with this.

A concrete example, which was actually my initial motivation to write the generators patch:

<span class="kw2">protected</span> <span class="kw2">function</span> getTests<span class="br0">(</span><span class="re0">$directory</span><span class="sy0">,</span> <span class="re0">$fileExtension</span><span class="br0">)</span> <span class="br0">{</span>
    <span class="re0">$it</span> <span class="sy0">=</span> <span class="kw2">new</span> RecursiveDirectoryIterator<span class="br0">(</span><span class="re0">$directory</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="re0">$it</span> <span class="sy0">=</span> <span class="kw2">new</span> RecursiveIteratorIterator<span class="br0">(</span><span class="re0">$it</span><span class="sy0">,</span> RecursiveIteratorIterator<span class="sy0">::</span><span class="me2">LEAVES_ONLY</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="re0">$it</span> <span class="sy0">=</span> <span class="kw2">new</span> RegexIterator<span class="br0">(</span><span class="re0">$it</span><span class="sy0">,</span> <span class="st_h">'(\.'</span> <span class="sy0">.</span> <span class="kw3">preg_quote</span><span class="br0">(</span><span class="re0">$fileExtension</span><span class="br0">)</span> <span class="sy0">.</span> <span class="st_h">'$)'</span><span class="br0">)</span><span class="sy0">;</span>
 
    <span class="re0">$tests</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">(</span><span class="br0">)</span><span class="sy0">;</span>
    <span class="kw1">foreach</span> <span class="br0">(</span><span class="re0">$it</span> <span class="kw1">as</span> <span class="re0">$file</span><span class="br0">)</span> <span class="br0">{</span>
        <span class="co1">// read file</span>
        <span class="re0">$fileContents</span> <span class="sy0">=</span> <span class="kw3">file_get_contents</span><span class="br0">(</span><span class="re0">$file</span><span class="br0">)</span><span class="sy0">;</span>
 
        <span class="co1">// parse sections</span>
        <span class="re0">$parts</span> <span class="sy0">=</span> <span class="kw3">array_map</span><span class="br0">(</span><span class="st_h">'trim'</span><span class="sy0">,</span> <span class="kw3">explode</span><span class="br0">(</span><span class="st_h">'-----'</span><span class="sy0">,</span> <span class="re0">$fileContents</span><span class="br0">)</span><span class="br0">)</span><span class="sy0">;</span>
 
        <span class="co1">// first part is the name</span>
        <span class="re0">$name</span> <span class="sy0">=</span> <span class="kw3">array_shift</span><span class="br0">(</span><span class="re0">$parts</span><span class="br0">)</span><span class="sy0">;</span>
 
        <span class="co1">// multiple sections possible with always two forming a pair</span>
        <span class="kw1">foreach</span> <span class="br0">(</span><span class="kw3">array_chunk</span><span class="br0">(</span><span class="re0">$parts</span><span class="sy0">,</span> <span class="nu0">2</span><span class="br0">)</span> <span class="kw1">as</span> <span class="re0">$chunk</span><span class="br0">)</span> <span class="br0">{</span>
            <span class="re0">$tests</span><span class="br0">[</span><span class="br0">]</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">(</span><span class="re0">$name</span><span class="sy0">,</span> <span class="re0">$chunk</span><span class="br0">[</span><span class="nu0">0</span><span class="br0">]</span><span class="sy0">,</span> <span class="re0">$chunk</span><span class="br0">[</span><span class="nu0">1</span><span class="br0">]</span><span class="br0">)</span><span class="sy0">;</span>
        <span class="br0">}</span>
    <span class="br0">}</span>
 
    <span class="kw1">return</span> <span class="re0">$tests</span><span class="sy0">;</span>
<span class="br0">}</span>

This is a function which I use to provide test vectors to PHPUnit. I point it to a directory containing test files and then split up those test files into individual tests + expected output. I can then use the result of the function to feed some test function via @dataProvider.

The problem with the above implementation obviously is that I have to read all tests into memory at once (instead of one-by-one).

How can I solve this problem? By turning it into an iterator obviously! But if you look closer, this isn't actually that easy, because I'm adding new tests in a nested loop. So I would have to implement some kind of complex push-back mechanism to solve the problem. And - getting back on topic - I can't use callbacks here either, because I need a traversable for use with @dataProvider. Generators on the other hand solve this problem very elegantly. Actually, all you have to do to turn it into a lazy generator is replace $tests[] = with yield.

The second, more general problem with callbacks is that it's very hard to manage state across calls. The classic example is a lexer + parser system. If you implement the lexer using a callback (i.e. lex(string $sourceCode, callable $tokenConsumer)) you would have to figure out some way to keep state between subsequent calls. You'd have to build some kind of state machine, which can quickly get really ugly, even for simple problems (just look at the hundreds of states that a typical LALR parser has). Again, generators solve this problem elegantly, because they maintain state implicitly, in the execution state.

Alternative yield syntax considerations

Andrew proposed to use a function-like syntax for yield instead of the keyword notation. The three yield variants would then look as follows:

  • yield()
  • yield($value)
  • yield($key => $value)

The main advantage of this syntax is that it would avoid the strange parentheses requirements for the yield $value syntax.

One of the main issues with the pseudo-function syntax is that it makes the semantics of yield less clear. Currently the yield syntax looks very similar to the return syntax. Both are very similar in a function, so it is desirable to keep them similar in syntax too.

Generally PHP uses the keyword $expr syntax instead of the keyword($expr) syntax in all places where the statement-use is more common than the expression-use. E.g. include $file; is usually used as a statement and only very rarely as an expression. isset($var) on the other hand is normally used as an expression (a statement use wouldn't make any sense, actually).

As yield will be used as a statement in the vast majority of cases the yield $expr syntax thus seems more appropriate. Furthermore the most common expression-use of yield is value-less, in which case the parentheses requirements don't apply (i.e. you can write just $data = yield;).

So the function-like yield($value) syntax would optimize a very rare use case (namely $recv = yield($send);), at the same time making the common use cases less clear.

Patch

The current implementation can be found in this branch: https://github.com/nikic/php-src/tree/addGeneratorsSupport.

I also created a PR so that the diff can be viewed more easily: https://github.com/php/php-src/pull/177

Vote

(来源:https://wiki.php.net/rfc/generators#closing_a_generator)

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/770663.htmlTechArticle下文的第一个逐行读取文件例子用三种方式实现;普通方法,迭代器和生成器,比较了他们的优缺点,很好,可以引用到自己的代码中 ,支...
声明:
この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。