Python commonly used PEP8 specifications and Python tricks-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Python commonly used PEP8 specifications and Python tricks

不言

May 05, 2018 pm 02:50 PM

pythonspecification

The content of this article is Python’s commonly used PEP8 specifications and Python tricks. Now I share it with everyone. Friends in need can take a look at the content of this article.

Preface

Transported + summarized from many places. In the future, based on this standard, we will combine some python tricks and write more pythonic code~

PEP8 encoding Specification

For the original English version, please click here

The following is organized by @bobo. For the original text, please see PEP8 Python Coding Specification OrganizeCode arrangement

Indentation. 4-space indentation (all editors can complete this function), do not use Tap, and cannot mix Tap and spaces.
The maximum length of each line is 79. You can use backslashes for line breaks. It is best to use parentheses. The line break point must be hit after the operator.
Two blank lines between class and top-level function definitions; one blank line between method definitions in the class; one blank line between logically irrelevant paragraphs within the function; try not to repeat them in other places Blank line.

Document layout

#The order of module content: module description and docstring—import—globals&constants—other definitions. The import part is arranged in the order of standard, third party and self-written, with a blank line in between
Do not use multiple libraries in one import sentence, such as import os, sys is not recommended
If you use from XX import XX to reference the library, you can omit 'module.', but naming conflicts may occur. In this case, import XX

Use of spaces

Do not add spaces before various right brackets.
Do not add spaces before commas, colons, and semicolons.
Do not add a space before the left bracket of the function. For example, do not add a space before the left bracket of the Func(1)
sequence. For example, list[2]
Add a space to the left and right of the operator, do not add spaces for alignment
The assignment operator used by the default parameter of the function Omit spaces
Do not write multiple statements on the same line, although using ';' is allowed
if/for/while statement, Even if there is only one execution statement, it must start on a new line

Comments

General principle, wrong comments are worse than no comments. So when a piece of code changes, the first thing to do is to modify the comments! Comments must be in English, preferably complete sentences, with the first letter capitalized. There must be a terminator after the sentence, and the terminator should be followed by two spaces to start the next sentence. If it is a phrase, the terminator can be omitted.

#Block comment, a comment added before a piece of code. Add a space after '#'. Paragraphs should be separated by lines with only '#'. For example:

# Description : Module config.# # Input : None## Output : None

line comment, add a comment after a line of code. For example: x = x + 1 # Increment xBut this method should be used as little as possible.
Avoid unnecessary comments.

Document description

Write docstrings for all public modules, functions, classes, and methods; none for non-public ones Necessary, but comments can be written (on the next line of def).
Please refer to the following method for single-line comments

def kos_root():
    """Return the pathname of the KOS root directory."""
    global _kos_root    if _kos_root: return _kos_root
    ...

Naming rules

* Overall In principle, new code must be compiled according to the following naming style, and the coding of existing libraries must maintain the style as much as possible. Never use uppercase 'i' and uppercase 'o' alone. *

Keep module names as short as possible and use all lowercase letters. You can use underscores.
Package names should be as short as possible, using all lowercase letters, and underscores are not allowed.
The classes are named in the CapWords format, and the classes used within the module are named in the _CapWords format.
Exception naming uses CapWords+Error suffix.
Global variables are only valid within the module as much as possible, similar to static in C language. There are two implementation methods, one is the all mechanism; the other is prefixed with an underscore.
Use all lowercase function names, and underscores can be used.
Constant names should be named in all uppercase letters, and underscores can be used.
Class attributes (methods and variables) are named in all lowercase, and underscores can be used.
Class attributes have three scopes: public, non-public and subclass API, which can be understood as public, private and protected in C++. Before non-public attributes, there is an underscore prefix. .
If the attributes of the class conflict with the keyword name, suffix it with an underscore, and try not to use abbreviations or other methods.
To avoid naming conflicts with subclass attributes, prefix two underscores before some attributes of the class. For example: __a is declared in class Foo, and when accessed, it can only be accessed through Foo._Foo__a to avoid ambiguity. If the subclass is also called Foo, there's nothing you can do.
The first parameter of the class method must be self, and the first parameter of the static method must be cls.

Programming suggestions

Consider the efficiency of other python implementations and other issues when coding, such as operator '+' The efficiency is very high in CPython (Python), but very low in Jython, so the .join() method should be used.
尽可能使用‘is’‘is not’取代‘==’，比如if x is not None 要优于if x
使用基于类的异常，每个模块或包都有自己的异常类，此异常类继承自Exception。
常中不要使用裸露的except，except后跟具体的exceptions。例如

try:
    ...except Exception as ex:    print ex

异常中try的代码尽可能少。
使用startswith() and endswith()代替切片进行序列前缀或后缀的检查。

foo = &#39;abc000xyz&#39;if foo.startswith(&#39;abc&#39;) and foo.endswith(&#39;xyz&#39;):    
print &#39;yes&#39;else:    print &#39;no&#39;#yes#而如下的方式不提倡if foo[:3]==&#39;abc&#39; and foo[-3:]==&#39;xyz&#39;:    
print &#39;yes&#39;else:    print &#39;no&#39;

使用isinstance()比较对象的类型。比如:

foo = &#39;abc000xyz&#39;# 提倡print isinstance(foo,int) # false# 不提倡print type(foo) == type(&#39;1&#39;) #true

判断序列空或不空，有如下规则：

foo = &#39;abc000xyz&#39;if foo:    
print "not empty"else:    
print "empty"#不提倡使用如下if len(foo):    
print "not empty"else:    print "empty"

二进制数据判断使用 if boolvalue的方式。

给自己的代码打分

使用pylint进行代码检查，@permilk–Python代码分析工具：PyChecker、Pylint

# 安装pip install pylint

写一段测试代码，命名为test.py

# -*- coding:utf-8 -*-# 原理:http://blog.csdn.net/morewindows/article/details/6684558# 代码提供: http://www.cnblogs.com/yekwol/p/5778040.htmldef parttion(vec, left, right):
    key = vec[left]
    low = left
    high = right    while low < high:        while (low < high) and (vec[high] >= key):
            high -= 1
        vec[low] = vec[high]        while (low < high) and (vec[low] <= key):
            low += 1
        vec[high] = vec[low]
        vec[low] = key    return low# 采用递归的方式进行函数构建def quicksort(vec, left, right):
    if left < right:
        p = parttion(vec, left, right)
        quicksort(vec, left, p-1)  # 再同样处理分片问题
        quicksort(vec, p+1, right)  
    return vec#s = [6, 8, 1, 4, 3, 9, 5, 4, 11, 2, 2, 15, 6]before_list = [4, 6, 1, 3, 5, 9]print "before sort:", before_list
after_list = quicksort(before_list, left=0, right=len(before_list)-1)print"after sort:", after_list

进行代码规范测试

# 使用pylint test.py# 输出Problem importing module variables.py: No module named functools_lru_cache
Problem importing module variables.pyc: No module named functools_lru_cache
No config file found, using default configuration
************* Module test
C: 24, 0: Trailing whitespace (trailing-whitespace)
C:  1, 0: Missing module docstring (missing-docstring)
C:  5, 0: Missing function docstring (missing-docstring)
C: 20, 0: Missing function docstring (missing-docstring)
C: 22, 8: Invalid variable name "p" (invalid-name)
C: 28, 0: Invalid constant name "before_list" (invalid-name)
C: 30, 0: Invalid constant name "after_list" (invalid-name)


Report
======23 statements analysed.

Statistics by type
------------------

+---------+-------+-----------+-----------+------------+---------+
|type     |number |old number |difference |%documented |%badname |
+=========+=======+===========+===========+============+=========+
|module   |1      |1          |=          |0.00        |0.00     |
+---------+-------+-----------+-----------+------------+---------+
|class    |0      |0          |=          |0           |0        |
+---------+-------+-----------+-----------+------------+---------+
|method   |0      |0          |=          |0           |0        |
+---------+-------+-----------+-----------+------------+---------+
|function |2      |2          |=          |0.00        |0.00     |
+---------+-------+-----------+-----------+------------+---------+Raw metrics-----------

+----------+-------+------+---------+-----------+
|type      |number |%     |previous |difference |
+==========+=======+======+=========+===========+
|code      |23     |71.88 |23       |=          |
+----------+-------+------+---------+-----------+
|docstring |0      |0.00  |0        |=          |
+----------+-------+------+---------+-----------+
|comment   |5      |15.62 |5        |=          |
+----------+-------+------+---------+-----------+
|empty     |4      |12.50 |4        |=          |
+----------+-------+------+---------+-----------+Duplication-----------

+-------------------------+------+---------+-----------+
|                         |now   |previous |difference |
+=========================+======+=========+===========+
|nb duplicated lines      |0     |0        |=          |
+-------------------------+------+---------+-----------+
|percent duplicated lines |0.000 |0.000    |=          |
+-------------------------+------+---------+-----------+Messages by category--------------------

+-----------+-------+---------+-----------+
|type       |number |previous |difference |
+===========+=======+=========+===========+
|convention |7      |7        |=          |
+-----------+-------+---------+-----------+
|refactor   |0      |0        |=          |
+-----------+-------+---------+-----------+
|warning    |0      |0        |=          |
+-----------+-------+---------+-----------+
|error      |0      |0        |=          |
+-----------+-------+---------+-----------+Messages--------

+--------------------+------------+
|message id          |occurrences |
+====================+============+
|missing-docstring   |3           |
+--------------------+------------+
|invalid-name        |3           |
+--------------------+------------+
|trailing-whitespace |1           |
+--------------------+------------+Global evaluation-----------------Your code has been rated at 6.96/10 (previous run: 6.96/10, +0.00)

Python奇技淫巧

使用 Counter 进行计数统计

>>> from collections import Counter>>> Counter(s=3, c=2, e=1, u=1)
Counter({&#39;s&#39;: 3, &#39;c&#39;: 2, &#39;u&#39;: 1, &#39;e&#39;: 1})>>> some_data=(&#39;c&#39;, &#39;2&#39;, 2, 3, 5, &#39;c&#39;, &#39;d&#39;, 4, 5, &#39;d&#39;, &#39;d&#39;)>>> Counter(some_data).most_common(2)
[(&#39;d&#39;, 3), (&#39;c&#39;, 2)]>>> some_data=[&#39;c&#39;, &#39;2&#39;, 2, 3, 5, &#39;c&#39;, &#39;d&#39;, 4, 5, &#39;d&#39;, &#39;d&#39;]>>> Counter(some_data).most_common(2)
[(&#39;d&#39;, 3), (&#39;c&#39;, 2)]>>> some_data={&#39;c&#39;, &#39;2&#39;, 2, 3, 5, &#39;c&#39;, &#39;d&#39;, 4, 5, &#39;d&#39;, &#39;d&#39;}>>> Counter(some_data).most_common(2)
[(&#39;c&#39;, 1), (3, 1)]

enumerate获取键值对

在同时需要index和value值的时候可以使用 enumerate。下列分别将字符串，数组，列表与字典遍历序列中的元素以及它们的下标

>>> for i,j in enumerate(&#39;abcde&#39;):...     print i,j... 0 a1 b2 c3 d4 e>>> for i,j in enumerate([1,2,3,4]):...     print i,j... 0 11 22 33 4>>> for i,j in enumerate([1,2,3,4],start=1):...     print i,j... 1 12 23 34 4

# 通过键索引来追踪元素
from collections import defaultdict

s = "the quick brown fox jumps over the lazy dog"words = s.split()
location = defaultdict(list)for m, n in enumerate(words):
    location[n].append(m)print location

# defaultdict(<type &#39;list&#39;>, {&#39;brown&#39;: [2], &#39;lazy&#39;: [7], &#39;over&#39;: [5], &#39;fox&#39;: [3],
# &#39;dog&#39;: [8], &#39;quick&#39;: [1], &#39;the&#39;: [0, 6], &#39;jumps&#39;: [4]})

os.path的使用

os.path.join用于拼接路径，好处是可以根据系统自动选择正确的路径分隔符”/”或”\”
os.path.split 把路径分割成dirname和basename，返回一个元组
os.listdir 获取路径下所有文件，返回list

import os
path=os.path.abspath("ams8B.zip")print path  # /Users/didi/Desktop/testhelp/ams8B.zip  # 实际上该文件夹下没有ams8B.zipprint os.path.join("/".join(path.split("/")[:-1]),&#39;ams8B.gz&#39;)  # /Users/didi/Desktop/testhelp/ams8B.gzprint os.path.join("home","user","test")  # home/user/test# 把路径分割成dirname和basename，返回一个元组，作用在于不用区别到底是&#39;\&#39;还是&#39;/&#39;print os.path.split(path)  # (&#39;/Users/didi/Desktop/testhelp&#39;, &#39;ams8B.zip&#39;)print os.path.join(os.path.split(path)[0],&#39;ams8B.gz&#39;)  #   /Users/didi/Desktop/testhelp/ams8B.gzprint os.getcwd()  # /Users/didi/Desktop/testhelpprint os.listdir(os.getcwd())  # [&#39;t1.txt&#39;, &#39;test1.py&#39;, &#39;test2.py&#39;, &#39;\xe6\x8e\xa5\xe9\xa9\xbeeta\xe5\x88\x86\xe5\xb8\x83.sh&#39;]

善用列表推导式

>>> foo = [2, 18, 9, 22, 17, 24, 8, 12, 27]>>> print filter(lambda x: x % 3 == 0, foo)
[18, 9, 24, 12, 27]
>>>>>> print map(lambda x: x * 2 + 10, foo)
[14, 46, 28, 54, 44, 58, 26, 34, 64]
>>>>>> print reduce(lambda x, y: x + y, foo)139

使用列表推导式

>>> [x * 2 + 10 for x in foo]
[14, 46, 28, 54, 44, 58, 26, 34, 64]>>> [x for x in foo if x % 3 == 0]
[18, 9, 24, 12, 27]

对于轻量级循环，可尽量使用列表推导式,熟练使用列表推导式可以很多情况下代替map，filter等

>>> foo = [2, 18, 9, 22, 17, 24, 8, 12, 27]>>> foo
[2, 18, 9, 22, 17, 24, 8, 12, 27]>>> [&#39;>3&#39; if i>3 else &#39;<3&#39; for i in foo]
[&#39;<3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;]>>> t=map(lambda x:&#39;<3&#39; if x<3 else &#39;>3&#39;,foo)>>> t
[&#39;<3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;, &#39;>3&#39;]
>>>

可参考：最简单的理解lambda，map，reduce，filter，列表推导式

sort 与 sorted

# 函数原型sorted(iterable[, cmp[, key[, reverse]]])   # 返回一个排序后的列表s.sort([cmp[, key[, reverse]]])             # 直接修改原列表，返回为None>>> persons = [{&#39;name&#39;: &#39;Jon&#39;, &#39;age&#39;: 32}, {&#39;name&#39;: &#39;Alan&#39;, &#39;age&#39;: 50}, {&#39;name&#39;: &#39;Bob&#39;, &#39;age&#39;: 23}]>>> sorted(persons, key=lambda x: (x[&#39;name&#39;], -x[&#39;age&#39;]))
[{&#39;name&#39;: &#39;Alan&#39;, &#39;age&#39;: 50}, {&#39;name&#39;: &#39;Bob&#39;, &#39;age&#39;: 23}, {&#39;name&#39;: &#39;Jon&#39;, &#39;age&#39;: 32}]>>> a = (1, 2, 4, 2, 3)>>> sorted(a)
[1, 2, 2, 3, 4]>>> students = [(&#39;john&#39;, &#39;A&#39;, 15), (&#39;jane&#39;, &#39;B&#39;, 12), (&#39;dave&#39;, &#39;B&#39;, 10),]  
>>> sorted(students, key=lambda student : student[2])   # sort by age  [(&#39;dave&#39;, &#39;B&#39;, 10), (&#39;jane&#39;, &#39;B&#39;, 12), (&#39;john&#39;, &#39;A&#39;, 15)]

所以如果实际过程中需要保留原有列表，可以使用sorted()。sort()不需要复制原有列表，消耗内存较小，效率较高。同时传入参数key比传入参数cmp效率要高，cmp传入的函数在整个排序过程中会调用多次，而key针对每个元素仅作一次处理。

关于cmp的使用，这位哥们总算踩到点python中sort()方法自定义cmp PythonTip-最大正整数

# cmp   --如果排序的元素是其他类型的，如果a逻辑小于b，函数返回负数；a逻辑等于b，函数返回0；a逻辑大于b，函数返回正数就行了，这决定着两者是否交换位置def Reverse(a,b):
    return b-a

list_ = [5,3,4,1,2]
new = sorted(list_,cmp=ps)print new  # [5, 4, 3, 2, 1]# # 这里的例子是，5，3做差值，为负，故两者不交换位置，里面的return作为条件

善用traceback 追踪深层错误

import tracebacktry:
    do somethingexcept Exception as ex:    print ex
    traceback.print_exc()

切片操作[]

相当于浅copy的作用

>>> a=[1,2,3]>>> b=a[:]>>> b.append(4)>>> b
[1, 2, 3, 4]>>> a
[1, 2, 3]>>> import copy>>> c=copy.copy(a)>>> c
[1, 2, 3]>>> c.append(4)>>> a
[1, 2, 3]>>> c
[1, 2, 3, 4]>>> d=a>>> d.append(4)>>> d
[1, 2, 3, 4]>>> a
[1, 2, 3, 4]# 这里顺便说下deepcopy# [理论部分可以参考这里](http://www.cnblogs.com/wait123/archive/2011/10/10/2206580.html)# 浅copy>>> import copy>>> a = [[1,2],3,4]>>> b = copy.copy(a)  
>>> id(a)54936008L>>> id(b)54964680L>>> a is bFalse>>> b
[[1, 2], 3, 4]>>> b[0][1]2>>> b[0][1]=2333>>> b
[[1, 2333], 3, 4]>>> a
[[1, 2333], 3, 4]# deepcopy>>> a = [[1,2],3,4]>>> c = copy.deepcopy(a)>>> id(a)55104008L>>> id(c)54974536L>>> a is cFalse>>> c
[[1, 2], 3, 4]>>> c[0][1]2>>> c[0][1]=233>>> c
[[1, 233], 3, 4]>>> a
[[1, 2], 3, 4]  # 不会随之改变# 这里测试下切片操作相当于浅copy>>> d = a[:]>>> d
[[1, 2], 3, 4]>>> d[0][1]=0>>> d
[[1, 0], 3, 4]>>> a
[[1, 0], 3, 4]  # 会随之改变

进行逆序排列

>>> b
[1, 2, 3, 4]>>> b[::-1]
[4, 3, 2, 1]

json.dump()/loads() 存储字典结构

# json.dumps : dict转成str # json.loads:str转成dictimport json
dict_ = {1:2, 3:4, "55":"66"}
json_str = json.dumps(dict_)print type(json_str), json_str 
# <type &#39;str&#39;> {"55": "66", "1": 2, "3": 4}print type(json.loads(json_str)) 
# <type &#39;dict&#39;> {u&#39;55&#39;: u&#39;66&#39;, u&#39;1&#39;: 2, u&#39;3&#39;: 4}

pprint打印结构

from pprint import pprint
data = [(1,{&#39;a&#39;:&#39;A&#39;,&#39;b&#39;:&#39;B&#39;,&#39;c&#39;:&#39;C&#39;,&#39;d&#39;:&#39;D&#39;}),

        (2,{&#39;e&#39;:&#39;E&#39;,&#39;f&#39;:&#39;F&#39;,&#39;g&#39;:&#39;G&#39;,&#39;h&#39;:&#39;H&#39;,            &#39;i&#39;:&#39;I&#39;,&#39;j&#39;:&#39;J&#39;,&#39;k&#39;:&#39;K&#39;,&#39;l&#39;:&#39;L&#39;

            }),]print data
pprint(data)#print效果[(1, {&#39;a&#39;: &#39;A&#39;, &#39;c&#39;: &#39;C&#39;, &#39;b&#39;: &#39;B&#39;, &#39;d&#39;: &#39;D&#39;}), (2, {&#39;e&#39;: &#39;E&#39;, &#39;g&#39;: &#39;G&#39;, &#39;f&#39;: &#39;F&#39;, &#39;i&#39;: &#39;I&#39;, &#39;h&#39;: &#39;H&#39;, &#39;k&#39;: &#39;K&#39;, &#39;j&#39;: &#39;J&#39;, &#39;l&#39;: &#39;L&#39;})]# pprint效果[(1, {&#39;a&#39;: &#39;A&#39;, &#39;b&#39;: &#39;B&#39;, &#39;c&#39;: &#39;C&#39;, &#39;d&#39;: &#39;D&#39;}),
 (2,
  {&#39;e&#39;: &#39;E&#39;,   &#39;f&#39;: &#39;F&#39;,   &#39;g&#39;: &#39;G&#39;,   &#39;h&#39;: &#39;H&#39;,   &#39;i&#39;: &#39;I&#39;,   &#39;j&#39;: &#39;J&#39;,   &#39;k&#39;: &#39;K&#39;,   &#39;l&#39;: &#39;L&#39;})]

zip打包元组

定义：zip([seql, …])接受一系列可迭代对象作为参数，将对象中对应的元素打包成一个个tuple（元组），然后返回由这些tuples组成的list（列表）。若传入参数的长度不等，则返回list的长度和参数中长度最短的对象相同。

#!/usr/bin/python# -*- coding: utf-8 -*-name = [&#39;mrlevo&#39;,&#39;hasky&#39;]
kind = [&#39;human&#39;,&#39;dog&#39;]
z1 = [1,2,3]
z2 = [4,5,6]
result = zip(z1,z2) # 压缩过程uzip= zip(*result) # 解压过程，拆分为元组print "the zip:",result# the zip: [(1, 4), (2, 5), (3, 6)]print "the uzip:",uzip#the uzip: [(1, 2, 3), (4, 5, 6)]print "the uzip part of z1:%s\nthe uzip part of z2:%s"%(str(uzip[0]),str(uzip[1]))#the uzip part of z1:(1, 2, 3)#the uzip part of z2:(4, 5, 6)

*args和**kw

*args仅仅只是用在函数定义的时候用来表示位置参数应该存储在变量args里面。Python允许我们制定一些参数并且通过args捕获其他所有剩余的未被捕捉的位置。当调用一个函数的时候，一个用*标志的变量意思是变量里面的内容需要被提取出来然后当做位置参数被使用。

def add(x, y):
    return x + y
list_ = [1,2]

add(list_[0], list_[1]) # 3add(*list_) # 3

*args要么是表示调用方法大的时候额外的参数可以从一个可迭代列表中取得，要么就是定义方法的时候标志这个方法能够接受任意的位置参数。接下来提到的**，**kw代表着键值对的字典,也就是说，你不用一个个字典用key来取value了

dict_ = {&#39;x&#39;: 1, &#39;y&#39;: 2, &#39;z&#39;:3}def bar(x, y, z):
    return x + y + z

bar(**dict_)  # 6bar(dict_[&#39;x&#39;],dict_[&#39;y&#39;],dict_[&#39;z&#39;])  # 6

内建函数 itertools优雅的迭代器

方法很多，就先介绍笔试题碰到的permutations

permutations(p[,r]);返回p中任意取r个元素做排列的元组的迭代器
如：permutations(‘abc’, 2) # 从’abcd’中挑选两个元素，比如ab, bc, … 将所有结果排序，返回为新的循环器。

注意，上面的组合分顺序，即ab, ba都返回。

combinations(‘abc’, 2) # 从’abcd’中挑选两个元素，比如ab, bc, … 将所有结果排序，返回为新的循环器。

注意，上面的组合不分顺序，即ab, ba的话，只返回一个ab。

再来个实际点的算法的一个例子，虽然毫无算法结构可言

# 输入一个字符串,按字典序打印出该字符串中字符的所有排列。例如输入字符串abc,则打印出由字符a,b,c所能排列出来的所有字符串abc,acb,bac,bca,cab和cba# itertools，内建库的用法# 参考：http://blog.csdn.net/neweastsun/article/details/51965226import itertoolsdef Permutation(ss):
    # write code here
    if not ss:        return []    return sorted(list(set(map(lambda x:&#39;&#39;.join(x), itertools.permutations(ss)))))

Permutation(&#39;abc&#39;)# [&#39;abc&#39;, &#39;acb&#39;, &#39;bac&#39;, &#39;bca&#39;, &#39;cab&#39;, &#39;cba&#39;]

更多可以参考：Python标准库13 循环器 (itertools)

使用with open() as f 来读取文件

对于文本的读取，使用f=open(‘path’)的方法来说，局限性很大，第一是内存加载问题，第二是文件流打开之后最后还需要关闭的问题，使用with..as轻松解决

with open(&#39;path&#39;) as f:    for line in f:
        do songthing# for line in f 这种用法是把文件对象f当作迭代对象，系统将自动处理IO缓存和内存管理。对于读取超大的文件，这个方法不会把内存撑爆，这个是按行读取的with open(&#39;path&#39;) as f:    for line in f.readlines():
        do something# 这个方法是将文本中的内容都放在了一个列表里，然后进行迭代，对于大量的数据而言，效果不好

使用yield节省内存开销

[1,2,3,4]这个是迭代器，用for来迭代它，生成器(x for x in range(4))也是迭代器的一种,但是你只能迭代它们一次.原因很简单,因为它们不是全部存在内存里,它们只在要调用的时候在内存里生成，Yield的用法和关键字return差不多,下面的函数将会返回一个生成器。迭代的时候碰到yield立刻return一个值，下一次迭代的时候，从yield的下一条语句开始执行

>>> mygenerator = (x*x for x in range(4))>>> mygenerator
<generator object <genexpr> at 0x1121b55a0>>>> mygenerator.next()0>>> mygenerator.next()1---------------------->>> def generater(n):...     for i in range(n):...         yield i...         print &#39;here&#39;...>>> g = generater(5)>>> g
<generator object generater at 0x10c801280> # 凡是带有yield的函数都变成了生成器，都可以被迭代next()使用，>>> g.next()0>>> g.next()
here1>>> g.next()
here2 # 这里说明了它的运作过程，第一次迭代的时候，运行到yield函数，进行返回，而不执行下一个动作，第二次迭代的时候，直接从上一次yield的地方的下一条语句进行执行，也就看到了如下的效果。

另一个 yield 的例子来源于文件读取。如果直接对文件对象调用 read() 方法，会导致不可预测的内存占用。好的方法是利用固定长度的缓冲区来不断读取文件内容。通过 yield，我们不再需要编写读文件的迭代类，就可以轻松实现文件读取：

# from 廖雪峰def read_file(fpath):
    BLOCK_SIZE = 1024
    with open(fpath, &#39;rb&#39;) as f:        while True:
            block = f.read(BLOCK_SIZE)           
             if block:                
            yield block            
            else:                
            return