Home >Backend Development >Python Tutorial >Detailed introduction to yield and generator in python
This article mainly explains the related information of yield and generator in python from the shallower to the deeper. The introduction in the article is very detailed and has certain reference value for everyone. Friends who need it can take a look below.
Foreword
This article will introduce yield and generator in detail from a shallower to a deeper level, including the following: what is a generator, how to generate a generator, and how to generate a generator. Features, basic and advanced application scenarios of generator, and precautions when using generator. This article does not include the enhanced generator or pep342 related content, this part will be introduced later.
generator basics
In the function definition of python, as long as the yield expression appears, then in fact What is defined is a generator function. The return value of calling this generator function
is a generator. This ordinary function call is different. For example:
def gen_generator(): yield 1 def gen_value(): return 1 if __name__ == '__main__': ret = gen_generator() print ret, type(ret) #<generator object gen_generator at 0x02645648> <type 'generator'> ret = gen_value() print ret, type(ret) # 1 <type 'int'>
As can be seen from the above code, the gen_generator
function returns a generator instance
generator has the following special features:
•Follow the iterator (iterator) protocol, which needs to implement __iter__
and next interface
•Can enter and return multiple times, and can pause the execution of the code in the function body
Let’s take a look at the test code:
>>> def gen_example(): ... print 'before any yield' ... yield 'first yield' ... print 'between yields' ... yield 'second yield' ... print 'no yield anymore' ... >>> gen = gen_example() >>> gen.next() # 第一次调用next before any yield 'first yield' >>> gen.next() # 第二次调用next between yields 'second yield' >>> gen.next() # 第三次调用next no yield anymore Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteratio
Call the gen example method and Nothing is output, indicating that the code in the function body has not yet started executing. When the next method of the generator is called, the generator will execute to the yield expression, return the content of the yield expression, and then pause (hang) at this place, so the first call to next prints the first sentence and returns "first yield". Pausing means that the method's local variables, pointer information, and running environment are saved until the next call to the next method is resumed. After calling next for the second time, it pauses at the last yield. If the next()
method is called again, a StopIteration exception will be thrown.
Because the for statement can automatically capture the StopIteration exception, the more common method for a generator (essentially any iterator) is to use it in a loop:
def generator_example(): yield 1 yield 2 if __name__ == '__main__': for e in generator_example(): print e # output 1 2
The generator generated by the generator function is the same as What is the difference between ordinary functions?
(1) The function starts running from the first line every time, while the generator runs from the beginning of the last yield
(2) A function call returns one (a set of) values at a time, while a generator can return
multiple times. (3) A function can be repeatedly called countless times, but a generator instance cannot continue to be called after the last value of yield or return.
Using Yield in a function and then calling the function is a way to generate a generator. Another common way is to use generator expression
, For example:
>>> gen = (x * x for x in xrange(5)) >>> print gen <generator object <genexpr> at 0x02655710>
generator application
Generator Basic Application
Why use generator? The most important reason is that it can generate and "return" results on demand instead of generating all return values at once, and sometimes you don't even know" All return values".
For example, for the following code
RANGE_NUM = 100 for i in [x*x for x in range(RANGE_NUM)]: # 第一种方法:对列表进行迭代 # do sth for example print i for i in (x*x for x in range(RANGE_NUM)): # 第二种方法:对generator进行迭代 # do sth for example print i
In the above code, the output of the two for statements is the same. The code literally means square brackets and small The difference between brackets. But this difference is very different. The first method returns a list, and the second method returns a generator object. As RANGE_NUM becomes larger, the list returned by the first method becomes larger and the memory occupied becomes larger; but there is no difference for the second method.
Let’s look at an example that can “return” an infinite number of times:
def fib(): a, b = 1, 1 while True: yield a a, b = b, a+b
This generator has the ability to generate countless “return values”, and users can decide when to stop iteration
generator advanced application
Usage scenario one:
Generator can be used to generate data streams, generator It does not generate a return value immediately, but waits until it is needed. It is equivalent to an active pull process (pull). For example, there is a log file, and each line generates a record. For each record, People in different departments may handle it differently, but we can provide a common, on-demand data flow.
def gen_data_from_file(file_name): for line in file(file_name): yield line def gen_words(line): for word in (w for w in line.split() if w.strip()): yield word def count_words(file_name): word_map = {} for line in gen_data_from_file(file_name): for word in gen_words(line): if word not in word_map: word_map[word] = 0 word_map[word] += 1 return word_map def count_total_chars(file_name): total = 0 for line in gen_data_from_file(file_name): total += len(line) return total if __name__ == '__main__': print count_words('test.txt'), count_total_chars('test.txt')
The above example comes from a lecture at PyCon in 2008. gen_words gen_data_from_file
is the data producer, and count_words count_total_chars is the data consumer. As you can see, data is only pulled when needed, rather than prepared in advance. In addition, (w for w in line.split() if w.strip())
in gen_words also generates a generator
Usage scenario two:
一些编程场景中,一件事情可能需要执行一部分逻辑,然后等待一段时间、或者等待某个异步的结果、或者等待某个状态,然后继续执行另一部分逻辑。比如微服务架构中,服务A执行了一段逻辑之后,去服务B请求一些数据,然后在服务A上继续执行。或者在游戏编程中,一个技能分成分多段,先执行一部分动作(效果),然后等待一段时间,然后再继续。对于这种需要等待、而又不希望阻塞的情况,我们一般使用回调(callback)的方式。下面举一个简单的例子:
def do(a): print 'do', a CallBackMgr.callback(5, lambda a = a: post_do(a)) def post_do(a): print 'post_do', a
这里的CallBackMgr注册了一个5s后的时间,5s之后再调用lambda
函数,可见一段逻辑被分裂到两个函数,而且还需要上下文的传递(如这里的参数a)。我们用yield来修改一下这个例子,yield返回值代表等待的时间。
@yield_dec def do(a): print 'do', a yield 5 print 'post_do', a
这里需要实现一个YieldManager, 通过yield_dec
这个decrator将do这个generator注册到YieldManager,并在5s后调用next方法。Yield版本实现了和回调一样的功能,但是看起来要清晰许多。
下面给出一个简单的实现以供参考:
# -*- coding:utf-8 -*- import sys # import Timer import types import time class YieldManager(object): def __init__(self, tick_delta = 0.01): self.generator_dict = {} # self._tick_timer = Timer.addRepeatTimer(tick_delta, lambda: self.tick()) def tick(self): cur = time.time() for gene, t in self.generator_dict.items(): if cur >= t: self._do_resume_genetator(gene,cur) def _do_resume_genetator(self,gene, cur ): try: self.on_generator_excute(gene, cur) except StopIteration,e: self.remove_generator(gene) except Exception, e: print 'unexcepet error', type(e) self.remove_generator(gene) def add_generator(self, gen, deadline): self.generator_dict[gen] = deadline def remove_generator(self, gene): del self.generator_dict[gene] def on_generator_excute(self, gen, cur_time = None): t = gen.next() cur_time = cur_time or time.time() self.add_generator(gen, t + cur_time) g_yield_mgr = YieldManager() def yield_dec(func): def _inner_func(*args, **kwargs): gen = func(*args, **kwargs) if type(gen) is types.GeneratorType: g_yield_mgr.on_generator_excute(gen) return gen return _inner_func @yield_dec def do(a): print 'do', a yield 2.5 print 'post_do', a yield 3 print 'post_do again', a if __name__ == '__main__': do(1) for i in range(1, 10): print 'simulate a timer, %s seconds passed' % i time.sleep(1) g_yield_mgr.tick()
注意事项:
(1)Yield是不能嵌套的!
def visit(data): for elem in data: if isinstance(elem, tuple) or isinstance(elem, list): visit(elem) # here value retuened is generator else: yield elem if __name__ == '__main__': for e in visit([1, 2, (3, 4), 5]): print e
上面的代码访问嵌套序列里面的每一个元素,我们期望的输出是1 2 3 4 5,而实际输出是1 2 5 。为什么呢,如注释所示,visit是一个generator function
,所以第4行返回的是generator object
,而代码也没这个generator实例迭代。那么改改代码,对这个临时的generator 进行迭代就行了。
def visit(data): for elem in data: if isinstance(elem, tuple) or isinstance(elem, list): for e in visit(elem): yield e else: yield elem
或者在python3.3中 可以使用yield from
,这个语法是在pep380加入的
def visit(data): for elem in data: if isinstance(elem, tuple) or isinstance(elem, list): yield from visit(elem) else: yield elem
(2)generator function中使用return
在python doc中,明确提到是可以使用return的,当generator执行到这里的时候抛出StopIteration异常。
def gen_with_return(range_num): if range_num < 0: return else: for i in xrange(range_num): yield i if __name__ == '__main__': print list(gen_with_return(-1)) print list(gen_with_return(1))
但是,generator function
中的return是不能带任何返回值的
def gen_with_return(range_num): if range_num < 0: return 0 else: for i in xrange(range_num): yield i
上面的代码会报错:SyntaxError: 'return' with argument inside generator
总结
The above is the detailed content of Detailed introduction to yield and generator in python. For more information, please follow other related articles on the PHP Chinese website!