Home  >  Article  >  Backend Development  >  Advanced usage of iterators and generators in Python

Advanced usage of iterators and generators in Python

高洛峰
高洛峰Original
2017-03-01 14:09:141069browse

Iterator

An iterator is an object attached to the iteration protocol - basically meaning it has a next method (method) that, when called, returns the sequence Next project. When there are no items to return, raise the StopIteration exception.

The iteration object allows one loop. It retains the state (position) of a single iteration, or from another perspective, an iteration object is required each time the sequence is looped. This means we can iterate over the same sequence more than once. Separating the iteration logic from the sequence gives us more ways to iterate.

Calling the __iter__ method of a container to create an iterator object is the most direct way to master iterators. The iter function saves us some keystrokes.

>>> nums = [1,2,3]   # note that ... varies: these are different objects
>>> iter(nums)              
<listiterator object at ...>
>>> nums.__iter__()           
<listiterator object at ...>
>>> nums.__reversed__()         
<listreverseiterator object at ...>

>>> it = iter(nums)
>>> next(it)      # next(obj) simply calls obj.next()
1
>>> it.next()
2
>>> next(it)
3
>>> next(it)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteration

When used within a loop, StopIteration is accepted and stops the loop. But with explicit invocation, we see that once the iterator elements are exhausted, accessing it will throw an exception.

Use for...in loop and also use __iter__ method. This allows us to transparently start iterating over a sequence. But if we already have an iterator, we want to be able to use them similarly in a for loop. In order to achieve this, in addition to next, the iterator also has a method __iter__ to return the iterator itself (self).

Support for iterators in Python is ubiquitous: all sequence and unordered containers in the standard library support it. This concept has also been extended to other things: for example, the file object supports iteration of lines.

>>> f = open(&#39;/etc/fstab&#39;)
>>> f is f.__iter__()
True

file itself is an iterator, and its __iter__ method does not create a separate object: only single-threaded sequential reading is allowed.

Generate expression
The second way to create an iterable object is through a generator expression, the basis of list comprehension. To increase clarity, generated expressions are always enclosed in parentheses or expressions. If parentheses are used, a generator iterator is created. In the case of square brackets, this process is 'short-circuited' and we get a list.

>>> (i for i in nums)          
<generator object <genexpr> at 0x...>
>>> [i for i in nums]
[1, 2, 3]
>>> list(i for i in nums)
[1, 2, 3]

In Python 2.7 and 3.x list expression syntax has been extended to dictionary and set expressions. A set is created when a generated expression is enclosed in curly braces. A dictionary dict is created when an expression contains key-value pairs of the form key:value:

>>> {i for i in range(3)}  
set([0, 1, 2])
>>> {i:i**2 for i in range(3)}  
{0: 0, 1: 1, 2: 4}

If you are unfortunate enough to be stuck in an ancient version of Python, this The syntax is a bit bad:

>>> set(i for i in &#39;abc&#39;)
set([&#39;a&#39;, &#39;c&#39;, &#39;b&#39;])
>>> dict((i, ord(i)) for i in &#39;abc&#39;)
{&#39;a&#39;: 97, &#39;c&#39;: 99, &#39;b&#39;: 98}

Generating expressions is pretty simple and goes without saying. There's only one gotcha worth mentioning: the index variable (i) leaks in versions of Python less than 3.

Generators

Generators are functions that produce a list of results rather than a single value.

The third way to create an iterable object is to call a generator function. A generator is a function containing the yield keyword. It is worth noting that the mere presence of this keyword completely changes the nature of the function: the yield statement does not have to be invoked or even accessible. But let the function become a generator. When a function is called, the instructions within it are executed. And when a generator is called, execution stops before the first instruction in it. A call to a generator creates a generator object attached to the iteration protocol. Just like regular functions, concurrent and recursive calls are allowed.
When next is called, the function executes to the first yield. Each time a yield statement is encountered, a value returned as next is obtained. After the yield statement is executed, the execution of the function is stopped.

>>> def f():
...  yield 1
...  yield 2
>>> f()                  
<generator object f at 0x...>
>>> gen = f()
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteration

Let’s walk through the entire process of a single generator function call.

>>> def f():
...  print("-- start --")
...  yield 3
...  print("-- middle --")
...  yield 4
...  print("-- finished --")
>>> gen = f()
>>> next(gen)
-- start --
3
>>> next(gen)
-- middle --
4
>>> next(gen)              
-- finished --
Traceback (most recent call last):
 ...
StopIteration

Compared with executing f() in a conventional function to immediately execute print, gen is assigned a value without executing any statements in the function body. Only when gen.next() is called next, the statements up to the first yield part are executed. The second statement prints -- middle -- and stops execution when the second yield is encountered. The third next print -- finished -- and to the end of the function, because there is no yield, an exception is thrown.

What happens when control returns to the caller after function yield? The state of each generator is stored in the generator object. From this point the generator function looks as if it is running in a separate thread, but this is just an illusion: execution is strictly single-threaded, but the interpreter retains and stores state between the next value requests.

Why are generators useful? As emphasized in the section on iterators, generator functions are just another way of creating iterable objects. Everything that can be completed by the yield statement can also be completed by the next method. However, there are advantages to using functions to let the interpreter magically create the iterator. A function can be much shorter than a class definition that requires next and __iter__ methods. More importantly, generator authors can more easily understand statements that are localized to local variables than having to pass instance properties of the iterator object between successive next calls.

还有问题是为何迭代器有用?当一个迭代器用来驱动循环,循环变得简单。迭代器代码初始化状态,决定是否循环结束,并且找到下一个被提取到不同地方的值。这凸显了循环体——最值得关注的部分。除此之外,可以在其它地方重用迭代器代码。

双向通信
每个yield语句将一个值传递给调用者。这就是为何PEP 255引入生成器(在Python2.2中实现)。但是相反方向的通信也很有用。一个明显的方式是一些外部(extern)语句,或者全局变量或共享可变对象。通过将先前无聊的yield语句变成表达式,直接通信因PEP 342成为现实(在2.5中实现)。当生成器在yield语句之后恢复执行时,调用者可以对生成器对象调用一个方法,或者传递一个值 给 生成器,然后通过yield语句返回,或者通过一个不同的方法向生成器注入异常。

第一个新方法是send(value),类似于next(),但是将value传递进作为yield表达式值的生成器中。事实上,g.next()和g.send(None)是等效的。

第二个新方法是throw(type, value=None, traceback=None),等效于在yield语句处

raise type, value, traceback

不像raise(从执行点立即引发异常),throw()首先恢复生成器,然后仅仅引发异常。选用单次throw就是因为它意味着把异常放到其它位置,并且在其它语言中与异常有关。

当生成器中的异常被引发时发生什么?它可以或者显式引发,当执行某些语句时可以通过throw()方法注入到yield语句中。任一情况中,异常都以标准方式传播:它可以被except和finally捕获,或者造成生成器的中止并传递给调用者。

因完整性缘故,值得提及生成器迭代器也有close()方法,该方法被用来让本可以提供更多值的生成器立即中止。它用生成器的__del__方法销毁保留生成器状态的对象。

让我们定义一个只打印出通过send和throw方法所传递东西的生成器。

>>> import itertools
>>> def g():
...   print &#39;--start--&#39;
...   for i in itertools.count():
...     print &#39;--yielding %i--&#39; % i
...     try:
...       ans = yield i
...     except GeneratorExit:
...       print &#39;--closing--&#39;
...       raise
...     except Exception as e:
...       print &#39;--yield raised %r--&#39; % e
...     else:
...       print &#39;--yield returned %s--&#39; % ans

>>> it = g()
>>> next(it)
--start--
--yielding 0--
0
>>> it.send(11)
--yield returned 11--
--yielding 1--
1
>>> it.throw(IndexError)
--yield raised IndexError()--
--yielding 2--
2
>>> it.close()
--closing--

注意: next还是__next__?

在Python 2.x中,接受下一个值的迭代器方法是next,它通过全局函数next显式调用,意即它应该调用__next__。就像全局函数iter调用__iter__。这种不一致在Python 3.x中被修复,it.next变成了it.__next__。对于其它生成器方法——send和throw情况更加复杂,因为它们不被解释器隐式调用。然而,有建议语法扩展让continue带一个将被传递给循环迭代器中send的参数。如果这个扩展被接受,可能gen.send会变成gen.__send__。最后一个生成器方法close显然被不正确的命名了,因为它已经被隐式调用。

链式生成器
注意: 这是PEP 380的预览(还未被实现,但已经被Python3.3接受)

比如说我们正写一个生成器,我们想要yield一个第二个生成器——一个子生成器(subgenerator)——生成的数。如果仅考虑产生(yield)的值,通过循环可以不费力的完成:

subgen = some_other_generator()
for v in subgen:
  yield v

然而,如果子生成器需要调用send()、throw()和close()和调用者适当交互的情况下,事情就复杂了。yield语句不得不通过类似于前一章节部分定义的try...except...finally结构来保证“调试”生成器函数。这种代码在PEP 380中提供,现在足够拿出将在Python 3.3中引入的新语法了:

yield from some_other_generator()

像上面的显式循环调用一样,重复从some_other_generator中产生值直到没有值可以产生,但是仍然向子生成器转发send、throw和close。

更多Python中的迭代器与生成器高级用法相关文章请关注PHP中文网!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn