Home  >  Article  >  Backend Development  >  Detailed introduction to yield and generator in python

Detailed introduction to yield and generator in python

Y2J
Y2JOriginal
2017-04-27 11:55:381178browse

This article mainly explains the related information of yield and generator in python from the shallower to the deeper. The introduction in the article is very detailed and has certain reference value for everyone. Friends who need it can take a look below.

Foreword

This article will introduce yield and generator in detail from a shallower to a deeper level, including the following: what is a generator, how to generate a generator, and how to generate a generator. Features, basic and advanced application scenarios of generator, and precautions when using generator. This article does not include the enhanced generator or pep342 related content, this part will be introduced later.

generator basics

In the function definition of python, as long as the yield expression appears, then in fact What is defined is a generator function. The return value of calling this generator function is a generator. This ordinary function call is different. For example:

def gen_generator():
 yield 1
def gen_value():
 return 1
 
if __name__ == '__main__':
 ret = gen_generator()
 print ret, type(ret) #<generator object gen_generator at 0x02645648> <type &#39;generator&#39;>
 ret = gen_value()
 print ret, type(ret) # 1 <type &#39;int&#39;>

As can be seen from the above code, the gen_generator function returns a generator instance

generator has the following special features:

•Follow the iterator (iterator) protocol, which needs to implement __iter__ and next interface

•Can enter and return multiple times, and can pause the execution of the code in the function body

Let’s take a look at the test code:

>>> def gen_example():
... print &#39;before any yield&#39;
... yield &#39;first yield&#39;
... print &#39;between yields&#39;
... yield &#39;second yield&#39;
... print &#39;no yield anymore&#39;
... 
>>> gen = gen_example()
>>> gen.next()    # 第一次调用next
before any yield
&#39;first yield&#39;
>>> gen.next()    # 第二次调用next
between yields
&#39;second yield&#39;
>>> gen.next()    # 第三次调用next
no yield anymore
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteratio

Call the gen example method and Nothing is output, indicating that the code in the function body has not yet started executing. When the next method of the generator is called, the generator will execute to the yield expression, return the content of the yield expression, and then pause (hang) at this place, so the first call to next prints the first sentence and returns "first yield". Pausing means that the method's local variables, pointer information, and running environment are saved until the next call to the next method is resumed. After calling next for the second time, it pauses at the last yield. If the next() method is called again, a StopIteration exception will be thrown.

Because the for statement can automatically capture the StopIteration exception, the more common method for a generator (essentially any iterator) is to use it in a loop:

def generator_example():
 yield 1
 yield 2

if __name__ == &#39;__main__&#39;:
 for e in generator_example():
 print e
 # output 1 2

The generator generated by the generator function is the same as What is the difference between ordinary functions?

 (1) The function starts running from the first line every time, while the generator runs from the beginning of the last yield

 (2) A function call returns one (a set of) values ​​at a time, while a generator can return

multiple times. (3) A function can be repeatedly called countless times, but a generator instance cannot continue to be called after the last value of yield or return.

Using Yield in a function and then calling the function is a way to generate a generator. Another common way is to use generator expression, For example:

  >>> gen = (x * x for x in xrange(5))
  >>> print gen
  <generator object <genexpr> at 0x02655710>

generator application

Generator Basic Application

Why use generator? The most important reason is that it can generate and "return" results on demand instead of generating all return values ​​at once, and sometimes you don't even know" All return values".

For example, for the following code 

RANGE_NUM = 100
 for i in [x*x for x in range(RANGE_NUM)]: # 第一种方法:对列表进行迭代
 # do sth for example
 print i

 for i in (x*x for x in range(RANGE_NUM)): # 第二种方法:对generator进行迭代
 # do sth for example
 print i

In the above code, the output of the two for statements is the same. The code literally means square brackets and small The difference between brackets. But this difference is very different. The first method returns a list, and the second method returns a generator object. As RANGE_NUM becomes larger, the list returned by the first method becomes larger and the memory occupied becomes larger; but there is no difference for the second method.

Let’s look at an example that can “return” an infinite number of times:

def fib():
 a, b = 1, 1
 while True:
 yield a
 a, b = b, a+b

This generator has the ability to generate countless “return values”, and users can decide when to stop iteration

generator advanced application

Usage scenario one:

Generator can be used to generate data streams, generator It does not generate a return value immediately, but waits until it is needed. It is equivalent to an active pull process (pull). For example, there is a log file, and each line generates a record. For each record, People in different departments may handle it differently, but we can provide a common, on-demand data flow.

def gen_data_from_file(file_name):
 for line in file(file_name):
 yield line

def gen_words(line):
 for word in (w for w in line.split() if w.strip()):
 yield word

def count_words(file_name):
 word_map = {}
 for line in gen_data_from_file(file_name):
 for word in gen_words(line):
  if word not in word_map:
  word_map[word] = 0
  word_map[word] += 1
 return word_map

def count_total_chars(file_name):
 total = 0
 for line in gen_data_from_file(file_name):
 total += len(line)
 return total
 
if __name__ == &#39;__main__&#39;:
 print count_words(&#39;test.txt&#39;), count_total_chars(&#39;test.txt&#39;)

The above example comes from a lecture at PyCon in 2008. gen_words gen_data_from_file is the data producer, and count_words count_total_chars is the data consumer. As you can see, data is only pulled when needed, rather than prepared in advance. In addition, (w for w in line.split() if w.strip()) in gen_words also generates a generator

Usage scenario two:

一些编程场景中,一件事情可能需要执行一部分逻辑,然后等待一段时间、或者等待某个异步的结果、或者等待某个状态,然后继续执行另一部分逻辑。比如微服务架构中,服务A执行了一段逻辑之后,去服务B请求一些数据,然后在服务A上继续执行。或者在游戏编程中,一个技能分成分多段,先执行一部分动作(效果),然后等待一段时间,然后再继续。对于这种需要等待、而又不希望阻塞的情况,我们一般使用回调(callback)的方式。下面举一个简单的例子:

 def do(a):
 print &#39;do&#39;, a
 CallBackMgr.callback(5, lambda a = a: post_do(a))
 
 def post_do(a):
 print &#39;post_do&#39;, a

这里的CallBackMgr注册了一个5s后的时间,5s之后再调用lambda函数,可见一段逻辑被分裂到两个函数,而且还需要上下文的传递(如这里的参数a)。我们用yield来修改一下这个例子,yield返回值代表等待的时间。

 @yield_dec
 def do(a):
 print &#39;do&#39;, a
 yield 5
 print &#39;post_do&#39;, a

这里需要实现一个YieldManager, 通过yield_dec这个decrator将do这个generator注册到YieldManager,并在5s后调用next方法。Yield版本实现了和回调一样的功能,但是看起来要清晰许多。

下面给出一个简单的实现以供参考:

# -*- coding:utf-8 -*-
import sys
# import Timer
import types
import time

class YieldManager(object):
 def __init__(self, tick_delta = 0.01):
 self.generator_dict = {}
 # self._tick_timer = Timer.addRepeatTimer(tick_delta, lambda: self.tick())

 def tick(self):
 cur = time.time()
 for gene, t in self.generator_dict.items():
  if cur >= t:
  self._do_resume_genetator(gene,cur)

 def _do_resume_genetator(self,gene, cur ):
 try:
  self.on_generator_excute(gene, cur)
 except StopIteration,e:
  self.remove_generator(gene)
 except Exception, e:
  print &#39;unexcepet error&#39;, type(e)
  self.remove_generator(gene)

 def add_generator(self, gen, deadline):
 self.generator_dict[gen] = deadline

 def remove_generator(self, gene):
 del self.generator_dict[gene]

 def on_generator_excute(self, gen, cur_time = None):
 t = gen.next()
 cur_time = cur_time or time.time()
 self.add_generator(gen, t + cur_time)

g_yield_mgr = YieldManager()

def yield_dec(func):
 def _inner_func(*args, **kwargs):
 gen = func(*args, **kwargs)
 if type(gen) is types.GeneratorType:
  g_yield_mgr.on_generator_excute(gen)

 return gen
 return _inner_func

@yield_dec
def do(a):
 print &#39;do&#39;, a
 yield 2.5
 print &#39;post_do&#39;, a
 yield 3
 print &#39;post_do again&#39;, a

if __name__ == &#39;__main__&#39;:
 do(1)
 for i in range(1, 10):
 print &#39;simulate a timer, %s seconds passed&#39; % i
 time.sleep(1)
 g_yield_mgr.tick()

注意事项:

(1)Yield是不能嵌套的!

def visit(data):
 for elem in data:
 if isinstance(elem, tuple) or isinstance(elem, list):
  visit(elem) # here value retuened is generator
 else:
  yield elem
  
if __name__ == &#39;__main__&#39;:
 for e in visit([1, 2, (3, 4), 5]):
 print e

上面的代码访问嵌套序列里面的每一个元素,我们期望的输出是1 2 3 4 5,而实际输出是1  2  5 。为什么呢,如注释所示,visit是一个generator function,所以第4行返回的是generator object,而代码也没这个generator实例迭代。那么改改代码,对这个临时的generator 进行迭代就行了。

def visit(data):
 for elem in data:
 if isinstance(elem, tuple) or isinstance(elem, list):
  for e in visit(elem):
  yield e
 else:
  yield elem

或者在python3.3中 可以使用yield from,这个语法是在pep380加入的

 def visit(data):
 for elem in data:
  if isinstance(elem, tuple) or isinstance(elem, list):
  yield from visit(elem)
  else:
  yield elem

(2)generator function中使用return

在python doc中,明确提到是可以使用return的,当generator执行到这里的时候抛出StopIteration异常。

def gen_with_return(range_num):
 if range_num < 0:
 return
 else:
 for i in xrange(range_num):
  yield i

if __name__ == &#39;__main__&#39;:
 print list(gen_with_return(-1))
 print list(gen_with_return(1))

但是,generator function中的return是不能带任何返回值的


 def gen_with_return(range_num):
 if range_num < 0:
  return 0
 else:
  for i in xrange(range_num):
  yield i

上面的代码会报错:SyntaxError: 'return' with argument inside generator

总结

The above is the detailed content of Detailed introduction to yield and generator in python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn